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Christine Mann Darden first passed through the gates of NASA's Langley Research Cen¬ 
ter in Hampton, Virginia, in the summer of 1967. Her newly minted master's degree in 
applied math had earned her a position as a data analyst there. In the city of Hampton, 
and across the United States, tensions were running high. In Los Angeles, a massive 
protest against the Vietnam War ended only when more than a thousand armed police 
officers attacked the peaceful protestors. One month later, even more violence engulfed 
the city of Detroit after a police raid spiraled out of control. The 1967 Detroit Riot, or 
the 1967 Detroit Rebellion, as it is increasingly known, resulted in over forty deaths 
and one thousand injuries. 

The gates of Langley might have shielded Darden from those physical confronta¬ 
tions, but her work there was no less removed from the national stage. By 1967, the 
space race was well underway, and the United States was losing. The Soviet Union had 
already sent a man into space and a rocket to the moon. The only thing standing in 
the way of a Soviet victory was to put those two pieces together. Meanwhile, the United 
States had suffered a series of defeats—and, in January of that year, an outright disas¬ 
ter, when a sudden fire during a launch test of the Apollo 1 spacecraft killed all three 
astronauts on board. 

While the nation mourned, everyone at NASA threw themselves back into their 
work—including Darden, who began her data analyst job in the immediate aftermath 
of the Apollo 1 disaster. Two years later, it would be her precise analysis of the physics of 
rocket reentry that would help to ensure the successful return of the Apollo 11 mission 
from the moon, effectively winning the space race for the United States. But it would 
be Darden herself, as a Black woman with technical expertise, working at a federal 
agency in which sexism and racism openly prevailed, who demonstrated that the ideo¬ 
logical mission of the United States—as a land based on the ideals of liberty, equality, 
and opportunity for all—was far from accomplished (figure 0.1). 
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Figure 0.1 

Christine Darden in the control room of the Unitary Plan Wind Tunnel at NASA's Langley 
Research Center in 1975. Courtesy of NASA. 

The 1960s, after all, were years of social protest and transformation as well as explo¬ 
ration into outer space. Darden herself had participated in several lunch-counter sit- 
ins at the Hampton Institute, the historically Black college that she attended for her 
undergraduate studies. 1 By the time that Darden joined NASA, its Virginia facility had 
been officially desegregated for several years. But it had yet to reckon with the unof¬ 
ficial segregation of both race and gender that remained in place—particularly among 
its women employees known as computers. 

Darden’s arrival at Langley coincided with the early days of digital computing. 
Although Langley could claim one of the most advanced computing systems of the 
time—an IBM 704, the first computer to support floating-point math—its resources 
were still limited. For most data analysis tasks, Langley's Advanced Computing Division 
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relied upon human computers like Darden herself. These computers were all women, 
trained in math or a related field, and tasked with performing the calculations that 
determined everything from the best wing shape for an airplane, to the best flight path 
to the moon. But despite the crucial roles they played in advancing this and other 
NASA research, they were treated like unskilled temporary workers. They were brought 
into research groups on a project-by-project basis, often without even being told any¬ 
thing about the source of the data they were asked to analyze. Most of the engineers, 
who were predominantly men, never even bothered to learn the computers' names. 

These women computers have only recently begun to receive credit for their crucial 
work, thanks to scholars of the history of computing—and to journalists like Margot 
Lee Shetterly, whose book, Hidden Figures: The American Dream and the Untold Story of the 
Black Women Who Helped Win the Space Race, along with its film adaptation, is respon¬ 
sible for bringing Christine Darden's story into the public eye. 2 Her story, like those 
of her colleagues, is one of hard work under discriminatory conditions. Each of these 
women computers was required to advocate for herself—and some, like Darden, chose 
also to advocate for others. It is because of both her contributions to data science and 
her advocacy for women that we have chosen to begin our book, Data Feminism, with 
Darden's story. For feminism begins with a belief in the "political, social, and economic 
equality of the sexes," as the Merriam-Webster Dictionary defines the term—as does, for 
the record, Beyonce. 3 And any definition of feminism also necessarily includes the 
activist work that is required to turn that belief into reality. In Data Feminism, we bring 
these two aspects of feminism together, demonstrating a way of thinking about data, 
their analysis, and their display, that is informed by this tradition of feminist activism 
as well as the legacy of feminist critical thought. 

As for Darden, she did not only apply her skills of data analysis to spaceflight tra¬ 
jectories; she also applied them to her own career path. After working at Langley for a 
number of years, she began to notice two distinct patterns in her workplace: men with 
math credentials were placed in engineering positions, where they could be promoted 
through the ranks of the civil service, while women with the same degrees were sent 
to the computing pools, where they languished until they retired or quit. She did not 
want to become one of those women, nor did she want others to experience the same 
fate. So she gathered up her courage and decided to approach the chief of her division 
to ask him why. As Darden, now seventy-five, told Shetterly in an interview for Hidden 
Figures, his response was sobering: "Well, nobody's ever complained," he told Darden. 
"The women seem to be happy doing that, so that's just what they do." 

In today's world, Darden might have gotten her boss fired—or at least served with 
an Equal Employment Opportunity Commission complaint. But at the time that 
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Darden posed her question, stereotypical remarks about “what women do" were par 
for the course. In fact, challenging assumptions about what women could or couldn't 
do—especially in the workplace—was the central subject of Betty Friedan's best-selling 
book, The Feminine Mystique. Published in 1963, The Feminine Mystique is often credited 
with starting feminism's so-called second wave. 4 Fed up with the enforced return to 
domesticity following the end of World War II, and inspired by the national conversa¬ 
tion about equality of opportunity prompted by the civil rights movement, women 
across the United States began to organize around a wide range of issues, including 
reproductive rights and domestic violence, as well as the workplace inequality and 
restrictive gender roles that Darden faced at Langley. 

That said, Darden's specific experience as a Black woman with a full-time job was 
quite different than that of a white suburban housewife—the central focus of The Femi¬ 
nine Mystique. And when critics rightly called out Friedan for failing to acknowledge the 
range of experiences of women in the United States (and abroad), it was women like 
Darden, among many others, whom they had in mind. In Feminist Theory: From Margin 
to Center, another landmark feminist book published in 1984, bell hooks puts it plainly: 
"[Friedan] did not discuss who would be called in to take care of the children and 
maintain the home if more women like herself were freed from their house labor and 
given equal access with white men to the professions. She did not speak of the needs of 
women without men, without children, without homes. She ignored the existence of 
all non-white women and poor white women. She did not tell readers whether it was 
more fulfilling to be a maid, a babysitter, a factory worker, a clerk, or a prostitute than 
to be a leisure-class housewife.'' 5 

In other words, Friedan had failed to consider how those additional dimensions 
of individual and group identity—like race and class, not to mention sexuality, abil¬ 
ity, age, religion, and geography, among many others—intersect with each other to 
determine one's experience in the world. Although this concept— intersectionality —did 
not have a name when hooks described it, the idea that these dimensions cannot be 
examined in isolation from each other has a much longer intellectual history. 6 Then, 
as now, key scholars and activists were deeply attuned to how the racism embedded 
in US culture, coupled with many other forms of oppression, made it impossible to 
claim a common experience—or a common movement—for all women everywhere. 
Instead, what was needed was "the development of integrated analysis and practice 
based upon the fact that the major systems of oppression are interlocking.” 7 These 
words are from the Combahee River Collective Statement, written in 1978 by the 
famed Black feminist activist group out of Boston. In this book, we draw heavily from 
intersectionality and other concepts developed through the work of Black feminist 
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scholars and activists because they offer some of the best ways for negotiating this 
multidimensional terrain. 

Indeed, feminism must be intersectional if it seeks to address the challenges of the 
present moment. We write as two straight, white women based in the United States, 
with four advanced degrees and five kids between us. We identify as middle-class and 
cisgender —meaning that our gender identity matches the sex that we were assigned at 
birth. We have experienced sexism in various ways at different points of our lives— 
being women in tech and academia, birthing and breastfeeding babies, and trying to 
advocate for ourselves and our bodies in a male-dominated health care system. But we 
haven't experienced sexism in ways that other women certainly have or that nonbi¬ 
nary people have, for there are many dimensions of our shared identity, as the authors 
of this book, that align with dominant group positions. This fact makes it impossible 
for us to speak from experience about some oppressive forces—racism, for example. But 
it doesn't make it impossible for us to educate ourselves and then speak about racism 
and the role that white people play in upholding it. Or to challenge ableism and the 
role that abled people play in upholding it. Or to speak about class and wealth inequali¬ 
ties and the role that well-educated, well-off people play in maintaining those. Or to 
believe in the logic of co-liberation. Or to advocate for justice through equity. Indeed, a 
central aim of this book is to describe a form of intersectional feminism that takes the 
inequities of the present moment as its starting point and begins its own work by ask¬ 
ing: How can we use data to remake the world? 8 

This is a complex and weighty task, and it will necessarily remain unfinished. But 
its size and scope need not stop us—or you, the readers of this book—from taking addi¬ 
tional steps toward justice. Consider Christine Darden, who, after speaking up to her 
division chief, heard nothing from him but radio silence. But then, two weeks later, 
she was indeed promoted and transferred to a group focused on sonic boom research. 
In her new position, Darden was able to begin directing her own research projects and 
collaborate with colleagues of all genders as a peer. Her self-advocacy serves as a model: 
a sustained attention to how systems of oppression intersect with each other, informed 
by the knowledge that comes from direct experience. It offers a guide for challenging 
power and working toward justice. 

What Is Data Feminism? 

Christine Darden would go on to conduct groundbreaking research on sonic boom 
minimization techniques, author more than sixty scientific papers in the field of com¬ 
putational fluid dynamics, and earn her PhD in mechanical engineering—all while 
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"juggling the duties of Girl Scout mom, Sunday school teacher, trips to music lessons, 
and homemaker," Shetterly reports. But even as she ascended the professional ranks, 
she could tell that her scientific accomplishments were still not being recognized as 
readily as those of her male counterparts; the men, it seemed, received promotions far 
more quickly. 

Darden consulted with Langley's Equal Opportunity Office, where a white woman 
by the name of Gloria Champine had been compiling a set of statistics about gen¬ 
der and rank. The data confirmed Darden's direct experience: that women and men— 
even those with identical academic credentials, publication records, and performance 
reviews—were promoted at vastly different rates. Champine recognized that her data 
could support Darden in her pursuit of a promotion and, furthermore, that these data 
could help communicate the systemic nature of the problem at hand. Champine visu¬ 
alized the data in the form of a bar chart, and presented the chart to the director of 
Darden’s division. 9 He was "shocked at the disparity," Shetterly reports, and Darden 
received the promotion she had long deserved. 10 Darden would advance to the top 
rank in the federal civil service, the first Black woman at Langley to do so. By the time 
that she retired from NASA, in 2007, Darden was a director herself. 11 

Although Darden's rise into the leadership ranks at NASA was largely the result of her 
own knowledge, experience, and grit, her story is one that we can only tell as a result 
of the past several decades of feminist activism and critical thought. It was a national 
feminist movement that brought women's issues to the forefront of US cultural poli¬ 
tics, and the changes brought about by that movement were vast. They included both 
the shifting gender roles that pointed Darden in the direction of employment at NASA 
and the creation of reporting mechanisms like the one that enabled her to continue 
her professional rise. But Darden's success in the workplace was also, presumably, the 
result of many unnamed colleagues and friends who may or may not have considered 
themselves feminists. These were the people who provided her with community and 
support—and likely a not insignificant number of casserole dinners—as she ascended 
the government ranks. These types of collective efforts have been made increas¬ 
ingly legible, in turn, because of the feminist scholars and activists whose decades 
of work have enabled us to recognize that labor—emotional as much as physical—as 
such today. 

As should already be apparent, feminism has been defined and used in many ways. 
Here and throughout the book, we employ the term feminism as a shorthand for the 
diverse and wide-ranging projects that name and challenge sexism and other forces 
of oppression, as well as those which seek to create more just, equitable, and livable 
futures. Because of this broadness, some scholars prefer to use the term feminisms, 
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which clearly signals the range of—and, at times, the incompatibilities among—these 
various strains of feminist activism and political thought. For reasons of readability, we 
choose to use the term feminism here, but our feminism is intended to be just as expan¬ 
sive. It includes the work of regular folks like Darden and Champine, public intellectu¬ 
als like Betty Friedan and bell hooks, and organizing groups like the Combahee River 
Collective, which have taken direct action to achieve the equality of the sexes. It also 
includes the work of scholars and other cultural critics—like Kimberle Crenshaw and 
Margot Lee Shetterly, among many more—who have used writing to explore the social, 
political, historical, and conceptual reasons behind the inequality of the sexes that we 
face today. 

In the process, these writers and activists have given voice to the many ways in 
which today's status quo is unjust. 12 These injustices are often the result of histori¬ 
cal and contemporary differentials of power, including those among men, women, 
and nonbinary people, as well as those among white women and Black women, aca¬ 
demic researchers and Indigenous communities, and people in the Global North and 
the Global South. Feminists analyze these power differentials so that they can change 
them. Such a broad focus—one that incorporates race, class, ability, and more—would 
have sounded strange to Friedan or to the white women largely credited for leading the 
fight for women's suffrage in the nineteenth century. 13 But the reality is that women of 
color have long insisted that any movement for gender equality must also consider the 
ways in which privilege and oppression are intersectional. 

Because the concept of intersectionality is essential for this whole book, let's get 
a bit more specific. The term was coined by legal theorist Kimberle Crenshaw in the 
late 1980s. 14 In law school, Crenshaw had come across the antidiscrimination case of 
DeGraffenreid v. General Motors. Emma DeGraffenreid was a Black working mother who 
had sought a job at a General Motors factory in her town. She was not hired and sued 
GM for discrimination. The factory did have a history of hiring Black people: many 
Black men worked in industrial and maintenance jobs there. They also had a history of 
hiring women: many white women worked there as secretaries. These two pieces of evi¬ 
dence provided the rationale for the judge to throw out the case. Because the company 
did hire Black people and did hire women, it could not be discriminating based on race 
or gender. But, Crenshaw wanted to know, what about discrimination on the basis of 
race and gender together? This was something different, it was real, and it needed to be 
named. Crenshaw not only named the concept, but would go on to explain and elabo¬ 
rate the idea of intersectionality in award-winning books, papers, and talks. 15 

Key to the idea of intersectionality is that it does not only describe the intersect¬ 
ing aspects of any particular person's identity (or positionalities, as they are sometimes 


8 


Introduction: Why Data Science Needs Feminism 


termed). 16 It also describes the intersecting forces of privilege and oppression at work 
in a given society. Oppression involves the systematic mistreatment of certain groups of 
people by other groups. It happens when power is not distributed equally—when one 
group controls the institutions of law, education, and culture, and uses its power to 
systematically exclude other groups while giving its own group unfair advantages (or 
simply maintaining the status quo). 17 In the case of gender oppression, we can point to 
the sexism, cissexism, and patriarchy that is evident in everything from political repre¬ 
sentation to the wage gap to who speaks more often (or more loudly) in a meeting. 18 In 
the case of racial oppression, this takes the form of racism and white supremacy. Other 
forms of oppression include ableism, colonialism, and classism. Each has its particular 
history and manifests differently in different cultures and contexts, but all involve a 
dominant group that accrues power and privilege at the expense of others. Moreover, 
these forces of power and privilege on the one hand and oppression on the other mesh 
together in ways that multiply their effects. 

The effects of privilege and oppression are not distributed evenly across all individu¬ 
als and groups, however. For some, they become an obvious and unavoidable part of 
daily life, particularly for women and people of color and queer people and immigrants: 
the list goes on. If you are a member of any or all of these (or other) minoritized groups, 
you experience their effects everywhere, shaping the choices you make (or don't get to 
make) each day. These systems of power are as real as rain. But forces of oppression can 
be difficult to detect when you benefit from them (we call this a privilege hazard later in 
the book). And this is where data come in: it was a set of intersecting systems of power 
and privilege that Darden was intent on exposing when she posed her initial question 
to her division chief. And it was that same set of intersecting systems of power and 
privilege that Darden sought to challenge when she approached Champine. Darden 
herself didn't need any more evidence of the problem she faced; she was already living 
it every day. 19 But when her experience was recorded as data and aggregated with oth¬ 
ers' experiences, it could be used to challenge institutional systems of power and have 
far broader impact than on her career trajectory alone. 

In this way, Darden models what we call data feminism: a way of thinking about 
data, both their uses and their limits, that is informed by direct experience, by a com¬ 
mitment to action, and by intersectional feminist thought. The starting point for data 
feminism is something that goes mostly unacknowledged in data science: power is not 
distributed equally in the world. Those who wield power are disproportionately elite, 
straight, white, able-bodied, cisgender men from the Global North. 20 The work of data 
feminism is first to tune into how standard practices in data science serve to reinforce 
these existing inequalities and second to use data science to challenge and change the 
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distribution of power. 21 Underlying data feminism is a belief in and commitment to 
co-liberation: the idea that oppressive systems of power harm all of us, that they under¬ 
mine the quality and validity of our work, and that they hinder us from creating true 
and lasting social impact with data science. 

We wrote this book because we are data scientists and data feminists. Although we 
speak as a "we" in this book, and share certain identities, experiences, and skills, we 
have distinct life trajectories and motivations for our work on this project. If we were 
sitting with you right now, we would each introduce ourselves by answering the ques¬ 
tion: What brings you here today? Placing ourselves in that scenario, here is what we 
would have to say. 

Catherine: I am a hacker mama. I spent fifteen years as a freelance software developer 
and experimental artist, now professor, working on projects ranging from serendipi¬ 
tous news-recommendation systems to countercartography to civic data literacy to 
making breast pumps not suck. I'm here writing this book because, for one, the hype 
around big data and AI is deafeningly male and white and technoheroic and the time is 
now to reframe that world with a feminist lens. The second reason I'm here is that my 
recent experience running a large, equity-focused hackathon taught me just how much 
people like me—basically, well-meaning liberal white people—are part of the problem 
in struggling for social justice. This book is one attempt to expose such workings of 
power, which are inside us as much as outside in the world. 22 

Lauren: I often describe myself as a professional nerd. I worked in software develop¬ 
ment before going to grad school to study English, with a particular focus on early 
American literature and culture. (Early means very early—like, the eighteenth cen¬ 
tury.) As a professor at an engineering school, I now work on research projects that 
translate this history into contemporary contexts. For instance, I'm writing a book 
about the history of data visualization, employing machine-learning techniques to 
analyze abolitionist newspapers, and designing a haptic recreation of a hundred-year- 
old visualization scheme that looks like a quilt. Through projects like these, I show 
how the rise of the concept of "data" (which, as it turns out, really took off in the 
eighteenth century) is closely connected to the rise of our current concepts of gen¬ 
der and race. So one of my reasons for writing this book is to show how the issues of 
racism and sexism that we see in data science today are by no means new. The other 
reason is to help translate humanistic thinking into practice and, in so doing, create 
more opportunities for humanities scholars to engage with activists, organizers, and 
communities. 23 
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We both strongly believe that data can do good in the world. But for it to do so, we 
must explicitly acknowledge that a key way that power and privilege operate in the 
world today has to do with the word data itself. The word dates to the mid-seventeenth 
century, when it was introduced to supplement existing terms such as evidence and fact. 
Identifying information as data, rather than as either of those other two terms, served 
a rhetorical purpose. 24 It converted otherwise debatable information into the solid basis 
for subsequent claims. But what information needs to become data before it can be 
trusted? Or, more precisely, whose information needs to become data before it can be 
considered as fact and acted upon? 25 Data feminism must answer these questions, too. 

The story that begins with Christine Darden entering the gates of Langley, passes 
through her sustained efforts to confront the structural oppression she encountered 
there, and concludes with her impressive array of life achievements, is a story about 
the power of data. Throughout her career, in ways large and small, Darden used data 
to make arguments and transform lives. But that's not all. Darden's feel-good biog¬ 
raphy is just as much a story about the larger systems of power that required data— 
rather than the belief in her lived experience—to perform that transformative work. An 
institutional mistrust of Darden's experiential knowledge was almost certainly a factor 
in Champine's decision to create her bar chart. Champine likely recognized, as did 
Darden herself, that she would need the bar chart to be believed. 

In this way, the alliance between Darden and Champine, and their work together, 
underscores the flaws and compromises that are inherent in any data-driven project. 
The process of converting life experience into data always necessarily entails a reduc¬ 
tion of that experience—along with the historical and conceptual burdens of the term. 
That Darden and Champine were able to view their work as a success despite these 
inherent constraints underscores even more the importance of listening to and learn¬ 
ing from people whose lives and voices are behind the numbers. No dataset or analysis 
or visualization or model or algorithm is the result of one person working alone. Data 
feminism can help to remind us that before there are data, there are people—people 
who offer up their experience to be counted and analyzed, people who perform that 
counting and analysis, people who visualize the data and promote the findings of any 
particular project, and people who use the product in the end. There are also, always, 
people who go uncounted—for better or for worse. And there are problems that cannot 
be represented—or addressed—by data alone. And so data feminism, like justice, must 
remain both a goal and a process, one that guides our thoughts and our actions as we 
move forward toward our goal of remaking the world. 
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Data and Power 


It took five state-of-the-art IBM System/360 Model 75 machines to guide the Apollo 11 
astronauts to the moon. Each was the size of a car and cost $3.5 million dollars. Fast 
forward to the present. We now have computers in the form of phones that fit in our 
pockets and—in the case of the 2019 Apple iPhone XR—can perform more than 140 
million more instructions per second than a standard IBM System/360. 26 That rate of 
change is astounding; it represents an exponential growth in computing capacity (fig¬ 
ure 0.2a). We've witnessed an equally exponential growth in our ability to collect and 
record information in digital form—and in the ability to have information collected 
about us (figure 0.2b). 
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Figure 0.2 

(a) The time-series chart included in the original paper on Moore's law, published in 1965, which 
posited that the number of transistors that could fit on an integrated circuit (and therefore con¬ 
tribute to computing capacity) would double every year. Courtesy of Gordon Moore, (b) Several 
years ago, researchers concluded that transistors were approaching their smallest size and that 
Moore's law would not hold. Nevertheless, today's computing power is what enabled Dr. Katie 
Bouman, a postdoctoral fellow at MIT, to contribute to a project that involved processing and 
compositing approximately five petabytes of data captured by the Event Horizon Telescope to cre¬ 
ate the first ever image of a black hole. After the publication of this photo in April 2019 showing 
her excitement—as one of the scientists on the large team that worked for years to capture the 
image—Bouman was subsequently trolled and harassed online. Courtesy of Tamy Emma Pepin/ 
Twitter. 
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But the act of collecting and recording data about people is not new at all. From the 
registers of the dead that were published by church officials in the early modern era to 
the counts of Indigenous populations that appeared in colonial accounts of the Ameri¬ 
cas, data collection has long been employed as a technique of consolidating knowledge 
about the people whose data are collected, and therefore consolidating power over 
their lives. 27 The close relationship between data and power is perhaps most clearly 
visible in the historical arc that begins with the logs of people captured and placed 
aboard slave ships, reducing richly lived lives to numbers and names. It passes through 
the eugenics movement, in the late nineteenth and early twentieth centuries, which 
sought to employ data to quantify the superiority of white people over all others. It 
continues today in the proliferation of biometrics technologies that, as sociologist Sim¬ 
one Browne has shown, are disproportionately deployed to surveil Black bodies. 28 

When Edward Snowden, the former US National Security Agency contractor, leaked 
his cache of classified documents to the press in 2013, he revealed the degree to which 
the federal government routinely collects data on its citizens—often with minimal 
regard to legality or ethics. 29 At the municipal level, too, governments are starting to 
collect data on everything from traffic movement to facial expressions in the interests 
of making cities "smarter.” 30 This often translates to reinscribing traditional urban pat¬ 
terns of power such as segregation, the overpolicing of communities of color, and the 
rationing of ever-scarcer city services. 31 

But the government is not alone in these data-collection efforts; corporations do it 
too—with profit as their guide. The words and phrases we search for on Google, the 
times of day we are most active on Facebook, and the number of items we add to our 
Amazon carts are all tracked and stored as data—data that are then converted into 
corporate financial gain. The most trivial of everyday actions—searching for a way 
around traffic, liking a friend's cat video, or even stepping out of our front doors in 
the morning—are now hot commodities. This is not because any of these actions are 
exceptionally interesting (although we do make an exception for Catherine’s cats) but 
because these tiny actions can be combined with other tiny actions to generate targeted 
advertisements and personalized recommendations—in other words, to give us more 
things to click on, like, or buy. 32 

This is the data economy, and corporations, often aided by academic researchers, 
are currently scrambling to see what behaviors—both online and off—remain to be 
turned into data and then monetized. Nothing is outside of datafication, as this process 
is sometimes termed—not your search history, or Catherine's cats, or the butt that 
Lauren is currently using to sit in her seat. To wit: Shigeomi Koshimizu, a Tokyo-based 
professor of engineering, has been designing matrices of sensors that collect data at 360 
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different positions around a rear end while it is comfortably ensconced in a chair. 33 He 
proposes that people have unique butt signatures, as unique as their fingerprints. In the 
future, he suggests, our cars could be outfitted with butt-scanners instead of keys or car 
alarms to identify the driver. 

Although datafication may occasionally verge into the realm of the absurd, it 
remains a very serious issue. Decisions of civic, economic, and individual impor¬ 
tance are already and increasingly being made by automated systems sifting through 
large amounts of data. For example, PredPol, a so-called predictive policing company 
founded in 2012 by an anthropology professor at the University of California, Los 
Angeles, has been employed by the City of Los Angeles for nearly a decade to determine 
which neighborhoods to patrol more heavily, and which neighborhoods to (mostly) 
ignore. But because PredPol is based on historical crime data and US policing practices 
have always disproportionately surveilled and patrolled neighborhoods of color, the 
predictions of where crime will happen in the future look a lot like the racist practices 
of the past. 34 These systems create what mathematician and writer Cathy O’Neil, in 
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, 
calls a "pernicious feedback loop," amplifying the effects of racial bias and of the crimi¬ 
nalization of poverty that are already endemic to the United States. 

O'Neil's solution is to open up the computational systems that produce these racist 
results. Only by knowing what goes in, she argues, can we understand what comes out. 
This is a key step in the project of mitigating the effects of biased data. Data feminism 
additionally requires that we trace those biased data back to their source. PredPol and 
the "three most objective data points" that it employs certainly amplify existing biases, 
but they are not the root cause. 35 The cause, rather, is the long history of the crimi¬ 
nalization of Blackness in the United States, which produces biased policing practices, 
which produce biased historical data, which are then used to develop risk models for 
the future. 36 Tracing these links to historical and ongoing forces of oppression can help 
us answer the ethical question, Should this system exist? 37 In the case of PredPol, the 
answer is a resounding no. 

Understanding this long and complicated chain reaction is what has motivated 
Yeshimabeit Milner, along with Boston-based activists, organizers, and mathemati¬ 
cians, to found Data for Black Lives, an organization dedicated to "using data science 
to create concrete and measurable change in the lives of Black communities.” 38 Groups 
like the Stop LAPD Spying coalition are using explicitly feminist and antiracist meth¬ 
ods to quantify and challenge invasive data collection by law enforcement. 39 Data 
journalists are reverse-engineering algorithms and collecting qualitative data at scale 
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about maternal harm. 40 Artists are inviting participants to perform ecological maps and 
using AI for making intergenerational family memoirs (figure 0.3a). 41 

All these projects are data science. Many people think of data as numbers alone, 
but data can also consist of words or stories, colors or sounds, or any type of infor¬ 
mation that is systematically collected, organized, and analyzed (figures 0.3b, 0.3c). 42 
The science in data science simply implies a commitment to systematic methods of 
observation and experiment. Throughout this book, we deliberately place diverse data 
science examples alongside each other. They come from individuals and small groups, 
and from across academic, artistic, nonprofit, journalistic, community-based, and for- 
profit organizations. This is due to our belief in a capacious definition of data sci¬ 
ence, one that seeks to include rather than exclude and does not erect barriers based 
on formal credentials, professional affiliation, size of data, complexity of technical 
methods, or other external markers of expertise. Such markers, after all, have long 
been used to prevent women from fully engaging in any number of professional fields, 
even as those fields—which include data science and computer science, among many 
others—were largely built on the knowledge that women were required to teach them¬ 
selves. 43 An attempt to push back against this gendered history is foundational to data 
feminism, too. 

Throughout its own history, feminism has consistently had to work to convince the 
world that it is relevant to people of all genders. We make the same argument: that 
data feminism is for everybody. (And here we borrow a line from bell hooks.) 44 You will 
notice that the examples we use are not only about women, nor are they created only 
by women. That's because data feminism isn't only about women. It takes more than one 
gender to have gender inequality and more than one gender to work toward justice. 
Likewise, data feminism isn't only for women. Men, nonbinary, and genderqueer people 
are proud to call themselves feminists and use feminist thought in their work. More¬ 
over, data feminism isn't only about gender. Intersectional feminists have keyed us into 
how race, class, sexuality, ability, age, religion, geography, and more are factors that 
together influence each person's experience and opportunities in the world. Finally, 
data feminism is about power—about who has it and who doesn't. Intersectional feminism 
examines unequal power. And in our contemporary world, data is power too. Because 
the power of data is wielded unjustly, it must be challenged and changed. 

Data Feminism in Action 

Data is a double-edged sword. In a very real sense, data have been used as a weapon 
by those in power to consolidate their control—over places and things, as well as 
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Figure 0.3 

We define data science expansively in this book—here are three examples, (a) Not the Only One by 
Stephanie Dinkins (2017), is a sculpture that features a Black family through the use of artificial 
intelligence. The AI is trained and taught by the underrepresented voices of Black and brown indi¬ 
viduals in the tech sector, (b) Researcher Margaret Mitchell and colleagues, in "Seeing through the 
Human Reporting Bias" (2016), have worked on systems to infer what is not said in human speech 
for the purposes of image classification. For example, people say "green bananas" but not "yel¬ 
low bananas" because yellow is implied as the default color of the banana. Similarly, people say 
"woman doctor" but do not say "man doctor," so it is the words that are not spoken that encode 
the bias, (c) A gender analysis of Hollywood film dialogue, "Film Dialogue from 2,000 Screenplays 
Broken Down by Gender and Age," by Hanah Anderson and Matt Daniels, created for The Pudding, 
a data journalism start-up (2017). 
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(a) A woman standing next 
to a bicycle with basket. 



Human Label Visual Label 


Bicycle 




(b) A city street filled with lots 
of people walking in the rain. 



Human Label Visual Label 


Bicycle 




(c) A yellow Vespa parked 
in a lot with other cars. 



Human Label Visual Label 


Yellow 


(d) A store display that has a 
lot of bananas on sale. 



Figure 0.3 (continued) 
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Figure 0.3 (continued) 


people. Indeed, a central goal of this book is to show how governments and corpora¬ 
tions have long employed data and statistics as management techniques to preserve an 
unequal status quo. Working with data from a feminist perspective requires knowing 
and acknowledging this history. To frame the trouble with data in another way: it's 
not a coincidence that the institution that employed Christine Darden and enabled 
her professional rise is the same that wielded the results of her data analysis to assert 
the technological superiority of the United States over its communist adversaries and 
to plant an American flag on the moon. But this flawed history does not mean ceding 
control of the future to the powers of the past. Data are part of the problem, to be sure. 
But they are also part of the solution. Another central goal of this book is to show how 
the power of data can be wielded back. 

To guide us in this work, we have developed seven core principles. Individually 
and together, these principles emerge from the foundation of intersectional feminist 
thought. Each of the following chapters is structured around a single principle. The 
seven principles of data feminism are as follows: 

1. Examine power. Data feminism begins by analyzing how power operates in the 
world. 

2. Challenge power. Data feminism commits to challenging unequal power structures 
and working toward justice. 




































18 


Introduction: Why Data Science Needs Feminism 


3. Elevate emotion and embodiment. Data feminism teaches us to value multiple 
forms of knowledge, including the knowledge that comes from people as living, 
feeling bodies in the world. 

4. Rethink binaries and hierarchies. Data feminism requires us to challenge the gen¬ 
der binary, along with other systems of counting and classification that perpetuate 
oppression. 

5. Embrace pluralism. Data feminism insists that the most complete knowledge comes 
from synthesizing multiple perspectives, with priority given to local, Indigenous, 
and experiential ways of knowing. 

6. Consider context. Data feminism asserts that data are not neutral or objective. They 
are the products of unequal social relations, and this context is essential for conduct¬ 
ing accurate, ethical analysis. 

7. Make labor visible. The work of data science, like all work in the world, is the work 
of many hands. Data feminism makes this labor visible so that it can be recognized 
and valued. 

Each of the following chapters takes up one of these principles, drawing upon exam¬ 
ples from the field of data science, expansively defined, to show how that principle 
can be put into action. Along the way, we introduce key feminist concepts like the 
matrix of domination (Patricia Hill Collins; see chapter 1), situated knowledge (Donna 
Haraway; see chapter 3), and emotional labor (Arlie Hochschild; see chapter 8), as well 
as some of our own ideas about what data feminism looks like in theory and practice. 
To this end, we introduce you to people at the cutting edge of data and justice. These 
include engineers and software developers, activists and community organizers, data 
journalists, artists, and scholars. This range of people, and the range of projects they 
have helped to create, is our way of answering the question: What makes a project 
feminist? As will become clear, a project may be feminist in content, in that it challenges 
power by choice of subject matter; in form, in that it challenges power by shifting the 
aesthetic and/or sensory registers of data communication; and/or in process, in that it 
challenges power by building participatory, inclusive processes of knowledge produc¬ 
tion. What unites this broad scope of data-based work is a commitment to action and 
a desire to remake the world. 

Our overarching goal is to take a stand against the status quo—against a world that 
benefits us, two white college professors, at the expense of others. To work toward 
this goal, we have chosen to feature the voices of those who speak from the margins, 
whether because of their gender, sexuality, race, ability, class, geographic location, or 
any combination of those (and other) subject positions. We have done so, moreover, 
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because of our belief that those with direct experience of inequality know better than 
we do about what actions to take next. For this reason, we have attempted to prioritize 
the work of people in closer proximity to issues of inequality over those who study 
inequality from a distance. In this book, we pay particular attention to inequalities 
at the intersection of gender and race. This reflects our location in the United States, 
where the most entrenched issues of inequality have racism at their source. Our val¬ 
ues statement, included as an appendix to this book, discusses the rationale for these 
authorial choices in more detail. 

Any book involves making choices about whose voices and whose work to include 
and whose voices and work to omit. We ask that those who find their perspectives 
insufficiently addressed or their work insufficiently acknowledged view these gaps as 
additional openings for conversation. Our sincere hope is to contribute in a small way 
to a much larger conversation, one that began long before we embarked upon this writ¬ 
ing process and that will continue long after these pages are through. 

This book is intended to provide concrete steps to action for data scientists seeking 
to learn how feminism can help them work toward justice, and for feminists seeking to 
learn how their own work can carry over to the growing field of data science. It is also 
addressed to professionals in all fields in which data-driven decisions are being made, 
as well as to communities that want to resist or mobilize the data that surrounds them. 
It is written for everyone who seeks to better understand the charts and statistics that 
they encounter in their day-to-day lives, and for everyone who seeks to communicate 
the significance of such charts and statistics to others. 

Our claim, once again, is that data feminism is for everyone. It's for people of all 
genders. It's by people of all genders. And most importantly: it's about much more than 
gender. Data feminism is about power, about who has it and who doesn't, and about 
how those differentials of power can be challenged and changed using data. We invite 
you, the readers of this book, to join us on this journey toward justice and toward 
remaking our data-driven world. 



1 The Power Chapter 


Principle: Examine Power 

Data feminism begins by analyzing how power operates in the world. 


When tennis star Serena Williams disappeared from Instagram in early September 2017, 
her six million followers assumed they knew what had happened. Several months ear¬ 
lier, in March of that year, Williams had accidentally announced her pregnancy to the 
world via a bathing suit selfie and a caption that was hard to misinterpret: "20 weeks." 
Now, they thought, her baby had finally arrived. 

But then they waited, and waited some more. Two weeks later, Williams finally 
reappeared, announcing the birth of her daughter and inviting her followers to watch 
a video that welcomed Alexis Olympia Ohanian Jr. to the world. 1 The video was a mon¬ 
tage of baby bump pics interspersed with clips of a pregnant Williams playing tennis 
and having cute conversations with her husband, Reddit cofounder Alexis Ohanian, 
and then, finally, the shot that her fans had been waiting for: the first clip of baby 
Olympia. Williams was narrating: "So we're leaving the hospital,” she explains. "It's 
been a long time. We had a lot of complications. But look who we got!" The scene fades 
to white, and the video ends with a set of stats: Olympia's date of birth, birth weight, 
and number of grand slam titles: 1. (Williams, as it turned out, was already eight weeks 
pregnant when she won the Australian Open earlier that year.) 

Williams's Instagram followers were, for the most part, enchanted. But soon, the 
enthusiastic congratulations were superseded by a very different conversation. A num¬ 
ber of her followers—many of them Black women like Williams herself—fixated on the 
comment she'd made as she was heading home from the hospital with her baby girl. 
Those "complications" that Williams experienced—other women had had them too. 
In Williams's case, the complications had been life-threatening, and her self-advocacy 
in the hospital played a major role in her survival. 
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Chapter 1 


On Williams's Instagram feed, dozens of women began posting their own experi¬ 
ences of childbirth gone horribly wrong. A few months later, Williams returned to 
social media—Facebook, this time—to continue the conversation (figure 1.1). Citing a 
2017 statement from the US Centers for Disease Control and Prevention (CDC), Wil¬ 
liams wrote that "Black women are over 3 times more likely than white women to die 
from pregnancy- or childbirth-related causes." 2 

These disparities were already well-known to Black-women-led reproductive jus¬ 
tice groups like SisterSong, the Black Mamas Matter Alliance, and Raising Our Sisters 
Everywhere (ROSE), some of whom had been working on the maternal health cri¬ 
sis for decades. Williams helped to shine a national spotlight on them. The main¬ 
stream media also recently had begun to pay more attention to the crisis as well. A few 
months earlier, Nina Martin of the investigative journalism outfit ProPublica, work¬ 
ing with Renee Montagne of NPR, had reported on the same phenomenon. 3 "Noth¬ 
ing Protects Black Women from Dying in Pregnancy and Childbirth," the headline 
read. In addition to the study cited by Williams, Martin and Montagne cited a sec¬ 
ond study from 2016, which showed that neither education nor income level—the 


O Serena Williams O 

January 15, 2018 ■ Facebook Creator • 0 

I didn't expect that sharing our family's story of Olympia's birth and all of complications after giving 
birth would start such an outpouring of discussion from women — especially black women — who have 
faced similar complications and women whose problems go unaddressed. 

These aren't just stories: according to the CDC, (Center for Disease Control) black women are over 3 
times more likely than White women to die from pregnancy- or childbirth-related causes. We have a lot 
of work to do as a nation and I hope my story can inspire a conversation that gets us to close this gap. 

Let me be clear: EVERY mother, regardless of race, or background deserves to have a healthy 
pregnancy and childbirth. I personally want all women of all colors to have the best experience they can 
have. My personal experience was not great but it was MY experience and I'm happy it happened to 
me. It made me stronger and it made me appreciate women — both women with and without kids — 
even more. We are powerful!!! 

I want to thank all of you who have opened up through online comments and other platforms to tell 
your story. I encourage you to continue to tell those stories. This helps. We can help others. Our voices 
are our power. 
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Figure 1.1 

A Facebook post by Serena Williams responding to her Instagram followers who had shared their 
stories of pregnancy and childbirth-related complications with her. Image from Serena Williams, 
January 15, 2018. 



The Power Chapter 


23 


factors usually invoked when attempting to account for healthcare outcomes that 
diverge along racial lines—impacted the fates of Black women giving birth. 4 On the 
contrary, the data showed that Black women with college degrees suffered more severe 
complications of pregnancy and childbirth than white women without high school 
diplomas. 

So what were these complications, more precisely? And how many women had 
actually died as a result? Nobody was counting. A 2014 United Nations report, coau¬ 
thored by SisterSong, described the state of data collection on maternal mortality in 
the United States as "particularly weak.” 5 The situation hadn't improved in 2017, when 
ProPublica began its reporting. In 2018, USA Today investigated these racial disparities, 
and found what was an even more fundamental problem: there was still no national 
system for tracking complications sustained in pregnancy and childbirth, even though 
similar systems had long been in place for tracking any number of other health issues, 
such as teen pregnancy, hip replacements, or heart attacks. 6 They also found that there 
was still no reporting mechanism for ensuring that hospitals follow national safety 
standards, as is required for both hip surgery and cardiac care. "Our maternal data 
is embarrassing," stated Stacie Geller, a professor of obstetrics and gynecology at the 
University of Illinois, when asked for comment. The chief of the CDC's Maternal and 
Infant Health branch, William Callaghan, makes the significance of this "embarrass¬ 
ing” data more clear: "What we choose to measure is a statement of what we value in 
health," he explains. 7 We might edit his statement to add that it's a measure of who we 
value in health, too. 8 

Why did it take the near-death of an international sports superstar for the media 
to begin paying attention to an issue that less famous Black women had been expe¬ 
riencing and organizing around for decades? Why did it take reporting by the pre¬ 
dominantly white mainstream press for US cities and states to begin collecting data on 
the issue? 9 Why are those data still not viewed as big enough, statistically significant 
enough, or of high enough quality for those cities and states, and other public institu¬ 
tions, to justify taking action? And why didn't those institutions just #believeblack- 
women in the first place? 10 

The answers to these questions are directly connected to larger issues of power and 
privilege. Williams recognized as much when asked by Glamour magazine about the 
fact that she had to demand that her medical team perform additional tests in order 
to diagnose her own postnatal complications—and because she was Serena Williams, 
twenty-three-time grand slam champion, they complied. 11 "If I wasn't who I am, it 
could have been me," she told Glamour, referring to the fact that the privilege she 
experienced as a tennis star intersected with the oppression she experienced as a Black 
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woman, enabling her to avoid becoming a statistic herself. As Williams asserted, "that's 
not fair." 12 

Needless to say, Williams is right. It's absolutely not fair. So how do we mitigate this 
unfairness? We begin by examining systems of power and how they intersect—like 
how the influences of racism, sexism, and celebrity came together first to send Wil¬ 
liams into a medical crisis and then, thankfully, to keep her alive. The complexity of 
these intersections is the reason that examine power is the first principle of data femi¬ 
nism, and the focus of this chapter. Examining power means naming and explaining 
the forces of oppression that are so baked into our daily lives—and into our datasets, 
our databases, and our algorithms—that we often don't even see them. Seeing oppres¬ 
sion is especially hard for those of us who occupy positions of privilege. But once we 
identify these forces and begin to understand how they exert their potent force, then 
many of the additional principles of data feminism—like challenging power (chapter 2), 
embracing emotion (chapter 3), and making labor visible (chapter 7)—become easier to 
undertake. 

Power and the Matrix of Domination 

But first, what do we mean by power ? We use the term power to describe the current 
configuration of structural privilege and structural oppression, in which some groups 
experience unearned advantages—because various systems have been designed by 
people like them and work for people them—and other groups experience systematic 
disadvantages—because those same systems were not designed by them or with people 
like them in mind. These mechanisms are complicated, and there are "few pure vic¬ 
tims and oppressors,” notes influential sociologist Patricia Hill Collins. In her landmark 
text, Black Feminist Thought, first published in 1990, Collins proposes the concept of the 
matrix of domination to explain how systems of power are configured and experienced. 13 
It consists of four domains: the structural, the disciplinary, the hegemonic, and the 
interpersonal, ffer emphasis is on the intersection of gender and race, but she makes 
clear that other dimensions of identity (sexuality, geography, ability, etc.) also result in 
unjust oppression, or unearned privilege, that become apparent across the same four 
domains. 

The structural domain is the arena of laws and policies, along with schools and insti¬ 
tutions that implement them. This domain organizes and codifies oppression. Take, 
for example, the history of voting rights in the United States. The US Constitution did 
not originally specify who was authorized to vote, so various states had different poli¬ 
cies that reflected their local politics. Most had to do with owning property, which, 
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Table 1.1 

The four domains of the matrix of domination 


Structural domain 

Organizes oppression: laws and policies. 


Disciplinary domain 

Administers and manages oppression. 
Implements and enforces laws and policies. 


Hegemonic domain 

Circulates oppressive ideas: culture and media. 


Interpersonal domain 

Individual experiences of oppression. 


Chart based on concepts introduced by Patricia Hill Collins in Black Feminist Thought: Knowledge, 
Consciousness, and the Politics of Empowerment. 

conveniently, only men could do. But with the passage of the Fourteenth Amendment 
in 1868, which granted the rights of US citizenship to those who had been enslaved, 
the nature of those rights—including voting—were required to be spelled out at the 
national level for the first time. More specifically, voting was defined as a right reserved 
for "male citizens." This is a clear instance of codified oppression in the structural 
domain. 

It would take until the passage of the Nineteenth Amendment in 1920 for most (but 
not all) women to be granted the right to vote. 14 Even still, many state voting laws con¬ 
tinued to include literacy tests, residency requirements, and other ways to indirectly 
exclude people who were not property-owning white men. These restrictions persist 
today, in the form of practices like dropping names from voter rolls, requiring photo 
IDs, and limits to early voting—the burdens of which are felt disproportionately by 
low-income people, people of color, and others who lack the time or resources to jump 
through these additional bureaucratic hoops. 15 This is the disciplinary domain that Col¬ 
lins names: the domain that administers and manages oppression through bureaucracy 
and hierarchy, rather than through laws that explicitly encode inequality on the basis 
of someone's identity. 16 

Neither of these domains would be possible without the hegemonic domain, which 
deals with the realm of culture, media, and ideas. Discriminatory policies and prac¬ 
tices in voting can only be enacted in a world that already circulates oppressive 
ideas about, for example, who counts as a citizen in the first place. Consider an anti¬ 
suffragist pamphlet from the 1910s that proclaims, "You do not need a ballot to clean 
out your sink spout." 17 Pamphlets like these, designed to be literally passed from hand 
to hand, reinforced preexisting societal views about the place of women in society. 
Today, we have animated GIFs instead of paper pamphlets, but the hegemonic func¬ 
tion is the same: to consolidate ideas about who is entitled to exercise power and who 
is not. 
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The final part of the matrix of domination is the interpersonal domain, which influ¬ 
ences the everyday experience of individuals in the world. How would you feel if you 
were a woman who read that pamphlet, for example? Would it have more or less of 
an impact if a male family member gave it to you? Or, for a more recent example, how 
would you feel if you took time off from your hourly job to go cast your vote, only to 
discover when you got there that your name had been purged from the official voting 
roll or that there was a line so long that it would require that you miss half a day's pay, 
or stand for hours in the cold, or ... the list could go on. These are examples of how it 
feels to know that systems of power are not on your side and, at times, are actively seek¬ 
ing to take away the small amount of power that you do possess. 18 

The matrix of domination works to uphold the undue privilege of dominant groups 
while unfairly oppressing minoritized groups. What does this mean? Beginning in this 
chapter and continuing throughout the book, we use the term minoritized to describe 
groups of people who are positioned in opposition to a more powerful social group. 
While the term minority describes a social group that is comprised of fewer people, 
minoritized indicates that a social group is actively devalued and oppressed by a domi¬ 
nant group, one that holds more economic, social, and political power. With respect 
to gender, for example, men constitute the dominant group, while all other genders 
constitute minoritized groups. This remains true even as women actually constitute a 
majority of the world population. Sexism is the term that names this form of oppres¬ 
sion. In relation to race, white people constitute the dominant group (racism); in rela¬ 
tion to class, wealthy and educated people constitute the dominant group (classism); 
and so on. 19 

Using the concept of the matrix of domination and the distinction between domi¬ 
nant and minoritized groups, we can begin to examine how power unfolds in and 
around data. This often means asking uncomfortable questions: who is doing the work 
of data science (and who is not)? Whose goals are prioritized in data science (and whose 
are not)? And who benefits from data science (and who is either overlooked or actively 
harmed)? 20 These questions are uncomfortable because they unmask the inconvenient 
truth that there are groups of people who are disproportionately benefitting from data 
science, and there are groups of people who are disproportionately harmed. Asking 
these who questions allows us, as data scientists ourselves, to start to see how privilege is 
baked into our data practices and our data products. 21 

Data Science by Whom? 

It is important to acknowledge the elephant in the server room: the demograph¬ 
ics of data science (and related occupations like software engineering and artificial 
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intelligence research) do not represent the population as a whole. According to the 
most recent data from the US Bureau of Labor Statistics, released in 2018, only 26 per¬ 
cent of those in "computer and mathematical occupations" are women. 22 And across 
all of those women, only 12 percent are Black or Latinx women, even though Black and 
Latinx women make up 22.5 percent of the US population. 23 A report by the research 
group AI Now about the diversity crisis in artificial intelligence notes that women com¬ 
prise only 15 percent of AI research staff at Facebook and 10 percent at Google. 24 These 
numbers are probably not a surprise. The more surprising thing is that those numbers 
are getting worse, not better. According to a research report published by the Ameri¬ 
can Association of University Women in 2015, women computer science graduates in 
the United States peaked in the mid-1980s at 37 percent, and we have seen a steady 
decline in the years since then to 26 percent today (figure 1.2). 25 As "data analysts" 
(low-status number crunchers) have become rebranded as "data scientists” (high status 
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Figure 1.2 

Computer science has always been dominated by men and the situation is worsening (even while 
many other scientific and technical fields have made significant strides toward gender parity). 
Women awarded bachelor's degrees in computer science in the United States peaked in the mid- 
1980s at 37 percent, and we have seen a steady increase in the ratio of men to women in the 
years since then. This particular report treated gender as a binary, so there was no data about 
nonbinary people. Graphic by Catherine D'lgnazio. Data from the National Center for Education 
Statistics. 
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researchers), women are being pushed out in order to make room for more highly val¬ 
ued and more highly compensated men. 26 

There are not disparities only along gender lines in the higher education pipeline. 
The same report noted specific underrepresentation for Native American women, mul¬ 
tiracial women, white women, and all Black and Latinx people. So is it really a surprise 
that each day brings a new example of data science being used to disempower and 
oppress minoritized groups? In 2018, it was revealed that Amazon had been developing 
an algorithm to screen its first-round job applicants. But because the model had been 
trained on the resumes of prior applicants, who were predominantly male, it developed 
an even stronger preference for male applicants. It downgraded resumes with the word 
women and graduates of women's colleges. Ultimately, Amazon had to cancel the proj¬ 
ect. 27 This example reinforces the work of Safiya Umoja Noble, whose book, Algorithms 
of Oppression, has shown how both gender and racial biases are encoded into some of 
the most pervasive data-driven systems—including Google search, which boasts over 
five billion unique web searches per day. Noble describes how, as recently as 2016, 
comparable searches for "three Black teenagers" and "three white teenagers” turned up 
wildly different representations of those teens. The former returned mugshots, while 
the latter returned wholesome stock photography. 28 

The problems of gender and racial bias in our information systems are complex, but 
some of their key causes are plain as day: the data that shape them, and the models 
designed to put those data to use, are created by small groups of people and then scaled 
up to users around the globe. But those small groups are not at all representative of the 
globe as a whole, nor even of a single city in the United States. When data teams are 
primarily composed of people from dominant groups, those perspectives come to exert 
outsized influence on the decisions being made—to the exclusion of other identities 
and perspectives. This is not usually intentional; it comes from the ignorance of being 
on top. We describe this deficiency as a privilege hazard. 

How does this come to pass? Let's take a minute to imagine what life is like for some¬ 
one who epitomizes the dominant group in data science: a straight, white, cisgender 
man with formal technical credentials who lives in the United States. When he looks 
for a home or applies for a credit card, people are eager for his business. People smile 
when he holds his girlfriend's hand in public. His body doesn't change due to child¬ 
birth or breastfeeding, so he does not need to think about workplace accommodations. 
He presents his social security number in jobs as a formality, but it never hinders his 
application from being processed or brings him unwanted attention. The ease with 
which he traverses the world is invisible to him because it has been designed for people 
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just like him. He does not think about how life might be different for everyone else. In 
fact, it is difficult for him to imagine that at all. 

This is the privilege hazard: the phenomenon that makes those who occupy the most 
privileged positions among us—those with good educations, respected credentials, and 
professional accolades—so poorly equipped to recognize instances of oppression in the 
world. 29 They lack what Anita Gurumurthy, executive director of IT for Change, has 
called "the empiricism of lived experience." 30 And this lack of lived experience—this 
evidence of how things truly are —profoundly limits their ability to foresee and prevent 
harm, to identify existing problems in the world, and to imagine possible solutions. 

The privilege hazard occurs at the level of the individual—in the interpersonal 
domain of the matrix of domination—but it is much more harmful in aggregate 
because it reaches the hegemonic, disciplinary and structural domains as well. So it 
matters deeply that data science and artificial intelligence are dominated by elite white 
men because it means there is a collective privilege hazard so great that it would be a 
profound surprise if they could actually identify instances of bias prior to unleashing 
them onto the world. Social scientist Kate Crawford has advanced the idea that the 
biggest threat from artificial intelligence systems is not that they will become smarter 
than humans, but rather that they will hard-code sexism, racism, and other forms of 
discrimination into the digital infrastructure of our societies. 31 

What's more, the same cis het white men responsible for designing those systems 
lack the ability to detect harms and biases in their systems once they've been released 
into the world. 32 In the case of the "three teenagers" Google searches, for example, 
it was a young Black teenager that pointed out the problem and a Black scholar who 
wrote about the problem. The burden consistently falls upon those more intimately 
familiar with the privilege hazard—in data science as in life—to call out the creators of 
those systems for their limitations. 

For example, Joy Buolamwini, a Ghanaian-American graduate student at MIT, was 
working on a class project using facial-analysis software. 33 But there was a problem— 
the software couldn't "see” Buolamwini's dark-skinned face (where "seeing" means 
that it detected a face in the image, like when a phone camera draws a square around a 
person's face in the frame). It had no problem seeing her lighter-skinned collaborators. 
She tried drawing a face on her hand and putting it in front of the camera; it detected 
that. Finally, Buolamwini put on a white mask, essentially going in "whiteface" 
(figure 1.3). 34 The system detected the mask's facial features perfectly. 

Digging deeper into the code and benchmarking data behind these systems, Buol¬ 
amwini discovered that the dataset on which many of facial-recognition algorithms 
are tested contains 78 percent male faces and 84 percent white faces. When she did 
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Figure 1.3 

Joy Buolamwini found that she had to put on a white mask for the facial detection program to 
"see" her face. Buolamwini is now founder of the Algorithmic Justice League. Courtesy of Joy 
Buolamwini. 

an intersectional breakdown of another test dataset—looking at gender and skin type 
together—only 4 percent of the faces in that dataset were women and dark-skinned. 
In their evaluation of three commercial systems, Buolamwini and computer scientist 
Timnit Gebru showed that darker-skinned women were up to forty-four times more 
likely to be misclassified than lighter-skinned males. 35 It's no wonder that the software 
failed to detect Buolamwini's face: both the training data and the benchmarking data 
relegate women of color to a tiny fraction of the overall dataset. 36 

This is the privilege hazard in action—that no coder, tester, or user of the software 
had previously identified such a problem or even thought to look. Buolamwini's work 
has been widely covered by the national media (by the New York Times, by CNN, by 
the Economist, by Bloomberg BusinessWeek, and others) in articles that typically contain 
a hint of shock. 37 This is a testament to the social, political, and technical importance 
of the work, as well as to how those in positions of power—not just in the field of data 
science, but in the mainstream media, in elected government, and at the heads of 
corporations—are so often surprised to learn that their "intelligent technologies" are 
not so intelligent after all. (They need to read data journalist Meredith Broussard's book 
Artificial Unintelligence) . 3S For another example, think back to the introduction of this 





The Power Chapter 


31 


book, where we quoted Shetterly as reporting that Christine Darden's white male man¬ 
ager was "shocked at the disparity" between the promotion rates of men and women. 
We can speculate that Darden herself wasn't shocked, just as Buolamwini and Gebru 
likely were not entirely shocked at the outcome of their study either. When sexism, rac¬ 
ism, and other forms of oppression are publicly unmasked, it is almost never surprising 
to those who experience them. 

For people in positions of power and privilege, issues of race and gender and class 
and ability—to name only a few—are OPP: other people's problems. Author and anti¬ 
racist educator Robin DiAngelo describes instances like the "shock" of Darden's boss or 
the surprise in the media coverage of Buolamwini's various projects as a symptom of 
the "racial innocence" of white people. 39 In other words, those who occupy positions 
of privilege in society are able to remain innocent of that privilege. Race becomes some¬ 
thing that only people of color have. Gender becomes something that only women and 
nonbinary people have. Sexual orientation becomes something that all people except 
heterosexual people have. And so on. A personal anecdote might help illustrate this 
point. When we published the first draft of this book online, Catherine told a colleague 
about it. His earnestly enthusiastic response was, "Oh great! I'll show it to my female 
graduate students!" To which Catherine rejoined, "You might want to show it to your 
other students, too.” 

If things were different—if the 79 percent of engineers at Google who are male were 
specifically trained in structural oppression before building their data systems (as social 
workers are before they undertake social work)—then their overrepresentation might 
be very slightly less of a problem. 40 But in the meantime, the onus falls on the indi¬ 
viduals who already feel the adverse effects of those systems of power to prove, over 
and over again, that racism and sexism exist—in datasets, in data systems, and in data 
science, as in everywhere else. 

Buolamwini and Gebru identified how pale and male faces were overrepresented in 
facial detection training data. Could we just fix this problem by diversifying the data 
set? One solution to the problem would appear to be straightforward: create a more 
representative set of training and benchmarking data for facial detection models. In 
fact, tech companies are starting to do exactly this. In January 2019, IBM released a 
database of one million faces called Diversity in Faces (DiF). 41 In another example, jour¬ 
nalist Amy Hawkins details how CloudWalk, a startup in China in need of more images 
of faces of people of African descent, signed a deal with the Zimbabwean government 
for it to provide the images the company was lacking. 42 In return for sharing its data, 
Zimbabwe will receive a national facial database and "smart" surveillance infrastruc¬ 
ture that it can install in airports, railways, and bus stations. 
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It might sound like an even exchange, but Zimbabwe has a dismal record on human 
rights. Making things worse, CloudWalk provides facial recognition technologies to 
the Chinese police—a conflict of interest so great that the global nonprofit Human 
Rights Watch voiced its concern about the deal. 43 Face harvesting is happening in 
the US as well. Researchers Os Keyes, Nikki Stevens and Jacqueline Wernimont have 
shown how immigrants, abused children, and dead people are some of the groups 
whose faces have been used to train software—without their consent. 44 So is a diverse 
database of faces really a good idea? Voicing his concerns in response to the announce¬ 
ment of Buolamwini and Gebru's 2018 study on Twitter, an Indigenous Marine veteran 
shot back, "I hope facial recognition software has a problem identifying my face too. 
That'd come in handy when the police come rolling around with their facial recogni¬ 
tion truck at peaceful demonstrations of dissent, cataloging all dissenters for 'safety 
and security.'" 45 

Better detection of faces of color cannot be characterized as an unqualified good. 
More often than not, it is enlisted in the service of increased oppression, greater sur¬ 
veillance, and targeted violence. Buolamwini understands these potential harms and 
has developed an approach that works across all four domains of the matrix of domi¬ 
nation to address the underlying issues of power that are playing out in facial analysis 
technology. Buolamwini and Gebru first quantified the disparities in the dataset—a 
technical audit, which falls in the disciplinary domain of the matrix of domination. 
Then, Buolamwini went on to launch the Algorithmic Justice League, an organization 
that works to highlight and intervene in instances of algorithmic bias. On behalf of 
the AJL, Buolamwini has produced viral poetry projects and given TED talks—taking 
action in the hegemonic domain, the realm of culture and ideas. She has advised on 
legislation and professional standards for the field of computer vision and called for 
a moratorium on facial analysis in policing on national media and in Congress. 46 
These are actions operating in the structural domain of the matrix of domination—the 
realm of law and policy. Throughout these efforts, the AJL works with students and 
researchers to help guide and shape their own work—the interpersonal domain. Taken 
together, Buolamwini's various initiatives demonstrate how any "solution" to bias 
in algorithms and datasets must tackle more than technical limitations. In addition, 
they present a compelling model for the data scientist as public intellectual—who, 
yes, works on technical audits and fixes, but also works on cultural, legal, and political 
efforts too. 

While equitable representation—in datasets and data science workforces—is impor¬ 
tant, it remains window dressing if we don't also transform the institutions that pro¬ 
duce and reproduce those biased outcomes in the first place. As doctoral health student 
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Arrianna Planey, quoting Robert M. Young, states, "A racist society will give you a racist 
science." 47 We cannot filter out the downstream effects of sexism and racism without 
also addressing their root cause. 

Data Science for Whom? 

One of the downstream effects of the privilege hazard—the risks incurred when people 
from dominant groups create most of our data products—is not only that datasets 
are biased or unrepresentative, but that they never get collected at all. Mimi Onu- 
oha—an artist, designer, and educator—has long been asking who questions about data 
science. Her project, The Library of Missing Datasets (figure 1.4), is a list of datasets that 
one might expect to already exist in the world, because they help to address pressing 
social issues, but that in reality have never been created. The project exists as a web¬ 
site and as an art object. The latter consists of a file cabinet filled with folders labeled 



Figure 1.4 

The Library of Missing Datasets, by Mimi Onuoha (2016) is a list of datasets that are not collected 
because of bias, lack of social and political will, and structural disregard. Courtesy of Mimi Onu¬ 
oha. Photo by Brandon Schulman. 
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with phrases like: “People excluded from public housing because of criminal records,” 
“Mobility for older adults with physical disabilities or cognitive impairments," and 
"Total number of local and state police departments using stingray phone trackers 
(IMSI-catchers)." Visitors can tab through the folders and remove any particular folder 
of interest, only to reveal that it is empty. They all are. The datasets that should be there 
are “missing." 

By compiling a list of the datasets that are missing from our "otherwise data- 
saturated” world, Onuoha explains, "we find cultural and colloquial hints of what 
is deemed important" and what is not. "Spots that we've left blank reveal our hid¬ 
den social biases and indifferences," she continues. And by calling attention to these 
datasets as "missing," she also calls attention to how the matrix of domination encodes 
these "social biases and indifferences" across all levels of society. 48 Along similar lines, 
foundations like Data2X and books like Invisible Women have advanced the idea of 
a systematic "gender data gap" due to the fact that the majority of research data in 
scientific studies is based around men's bodies. The downstream effects of the gender 
data gap range from annoying—cell phones slightly too large for women's hands, for 
example—to fatal. Until recently, crash test dummies were designed in the size and 
shape of men, an oversight that meant that women had a 47 percent higher chance of 
car injury than men. 49 

The who question in this case is: Who benefits from data science and who is over¬ 
looked? Examining those gaps can sometimes mean calling out missing datasets, as 
Onuoha does; characterizing them, as Invisible Women does; and advocating for filling 
them, as Data2X does. At other times, it can mean collecting the missing data yourself. 
Lacking comprehensive data about women who die in childbirth, for example, Pro- 
Publica decided to resort to crowdsourcing to learn the names of the estimated seven 
hundred to nine hundred US women who died in 2016. 50 As of 2019, they've identified 
only 140. Or, for another example: in 1998, youth living in Roxbury—a neighborhood 
known as "the heart of Black culture in Boston” 51 —were sick and tired of inhaling pol¬ 
luted air. They led a march demanding clean air and better data collection, which led 
to the creation of the AirBeat community monitoring project. 52 

Scholars have proposed various names for these instances of ground-up data collec¬ 
tion, including counterdata or agonistic data collection, data activism, statactivism, and 
citizen science (when in the service of environmental justice). 53 Whatever it's called, 
it's been going on for a long time. In 1895, civil rights activist and pioneering data 
journalist Ida B. Wells assembled a set of statistics on the epidemic of lynching that 
was sweeping the United States. 54 She accompanied her data with a meticulous expose 
of the fraudulent claims made by white people—typically, that a rape, theft, or assault 
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of some kind had occurred (which it hadn't in most cases) and that lynching was a 
justified response. Today, an organization named after Wells—the Ida B. Wells Society 
for Investigative Reporting—continues her mission by training up a new generation of 
journalists of color in the skills of data collection and analysis. 55 

A counterdata initiative in the spirit of Wells is taking place just south of the US 
border, in Mexico, where a single woman is compiling a comprehensive dataset on 
femicides—gender-related killings of women and girls. 56 Maria Salguero, who also goes 
by the name Princesa, has logged more than five thousand cases of femicide since 
2016. 57 Her work provides the most accessible information on the subject for journal¬ 
ists, activists, and victims' families seeking justice. 

The issue of femicide in Mexico rose to global visibility in the mid-2000s with wide¬ 
spread media coverage about the deaths of poor and working-class women in Ciudad 
Juarez. A border town, Juarez is the site of more than three hundred maquiladoras: 
factories that employ women to assemble goods and electronics, often for low wages 
and in substandard working conditions. Between 1993 and 2005, nearly four hundred 
of these women were murdered, with around a third of those murders exhibiting signs 
of exceptional brutality or sexual violence. Convictions were made in only three of 
those deaths. In response, a number of activist groups like Ni Una Mas (Not One More) 
and Nuestras Hijas de Regreso a Casa (Our Daughters Back Home) were formed, largely 
motivated by mothers demanding justice for their daughters, often at great personal 
risk to themselves. 58 

These groups succeeded in gaining the attention of the Mexican government, which 
established a Special Commission on Femicide. But despite the commission and the 
fourteen volumes of information about femicide that it produced, and despite a 2009 
ruling against the Mexican state by the Inter-American Human Rights Court, and 
despite a United Nations Symposium on Femicide in 2012, and despite the fact that 
sixteen Latin American countries have now passed laws defining femicide—despite all 
of this, deaths in Juarez have continued to rise. 59 In 2009 a report pointed out that one 
of the reasons that the issue had yet to be sufficiently addressed was the lack of data. 60 
Needless to say, the problem remains. 

How might we explain the missing data around femicides in relation to the four 
domains of power that constitute Collins's matrix of domination? As is true in so many 
cases of data collected (or not) about women and other minoritized groups, the collec¬ 
tion environment is compromised by imbalances of power. 

The most grave and urgent manifestation of the matrix of domination is within the 
interpersonal domain, in which cis and trans women become the victims of violence 
and murder at the hands of men. Although law and policy (the structural domain) 
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have recognized the crime of femicide, no specific policies have been implemented to 
ensure adequate information collection, either by federal agencies or local authorities. 
Thus the disciplinary domain, in which law and policy are enacted, is characterized by 
a deferral of responsibility, a failure to investigate, and victim blaming. This persists in 
a somewhat recursive fashion because there are no consequences imposed within the 
structural domain. For example, the Special Commission's definition of femicide as a 
"crime of the state" speaks volumes to how the government of Mexico is deeply com¬ 
plied through inattention and indifference. 61 

Of course, this inaction would not have been tolerated without the assistance of 
the hegemonic domain—the realm of media and culture—which presents men as 
strong and women as subservient, men as public and women as private, trans people 
as deviating from "essential" norms, and nonbinary people as nonexistent altogether. 
Indeed, government agencies have used their public platforms to blame victims. Fol¬ 
lowing the femicide of twenty-two-year-old Mexican student Lesvy Osorio in 2017, 
researcher Maria Rodriguez-Dominguez documented how the Public Prosecutor's 
Office of Mexico City shared on social media that the victim was an alcoholic and 
drug user who had been living out of wedlock with her boyfriend. 62 This led to justi¬ 
fied public backlash, and to the hashtag #SiMeMatan (If they kill me), which prompted 
sarcastic tweets such as "#SiMeMatan it's because I liked to go out at night and drink 
a lot of beer.'' 63 

It is into this data collection environment, characterized by extremely asymmetrical 
power relations, that Maria Salguero has inserted her femicides map. Salguero manu¬ 
ally plots a pin on the map for every femicide that she collects through media reports 
or through crowdsourced contributions (figure 1.5a). One of her goals is to "show that 
these victims [each] had a name and that they had a life," and so Salguero logs as 
many details as she can about each death. These include name, age, relationship with 
the perpetrator, mode and place of death, and whether the victim was transgender, as 
well as the full content of the news report that served as the source. Figure 1.5b shows 
a detailed view for a single report from an unidentified transfemicide, including the 
date, time, location, and media article about the killing. It can take Salguero three to 
four hours a day to do this unpaid work. She takes occasional breaks to preserve her 
mental health, and she typically has a backlog of a month's worth of femicides to add 
to the map. 

Although media reportage and crowdsourcing are imperfect ways of collecting data, 
this particular map, created and maintained by a single person, fills a vacuum created 
by her national government. The map has been used to help find missing women, and 
Salguero herself has testified before Mexico's Congress about the scope of the problem. 
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Figure 1.5 

Maria Salguero's map of femicides in Mexico (2016-present) can be found at https://feminicidiosmx 
.crowdmap.com/. (a) Map extent showing the whole country, (b) A detailed view of Ciudad Juarez 
with a focus on a single report of an anonymous transfemicide. Salguero crowdsources points on 
the map based on reports in the press and reports from citizens to her. Courtesy of Maria Salguero. 
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<- #Transfeminicidio Identidad... 


Nombre 

#Tran8feminicidio Identidad Reservada 

Fecha 

15/08/2017 23:36 


lugar 

Pedro Meneses Hoyos, Ciudad JuSrez, Juarez, Chihuahua, 
32370, Mexico 

Hechos 

MARTES15 DE AGOSTO DE 2017 | POR EDITOR 
12 

Ju$rez, Chih.- Un individuo que aparentemente pertenecf 
a la comunidad LGBT fue localizado sin vida por la noche 
en un fraccionamiento ubicado al sur oriente de la ciudad, 
reportaron las corporadones policicas. 

El cuerpo del hombre vestido de mujer y en avanzado 
estado de descomposicidn fue encontrado en el fondo de 
un pozo de contencidn de aguas pluviales. 

El occlso tenl una bolsa de piastico en la cabeza, aunque 
personal de la Fiscal! General del estado asegura no le 
pudieron encontrar huellas externas de violencla. 

Al lugar de los hechos llegaron sus familiares y lo 
identificaron como Hilario Ldpez Ruiz, de quien no se 
proporciono mis informacidn. 

El cuerpo fue enviado al Servicio Medico Forense donde 
se le practicara la autopsia de ley y determinar de esa 
manera las causes reales de su fallecimiento. 

Latitud 

31.680782 


Longitud 

-106.414466 



Figure 1.5 (continued) 


Salguero is not affiliated with an activist group, but she makes her data available to 
activist groups for their efforts. Parents of victims have called her to give their thanks 
for making their daughters visible, and Salguero affirms this function as well: "This 
map seeks to make visible the sites where they are killing us, to find patterns, to bolster 
arguments about the problem, to georeference aid, to promote prevention and try to 
avoid femicides." 

It is important to make clear that the example of missing data about femicides in 
Mexico is not an isolated case, either in terms of subject matter or geographic loca¬ 
tion. The phenomenon of missing data is a regular and expected outcome in all societ¬ 
ies characterized by unequal power relations, in which a gendered, racialized order is 
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maintained through willful disregard, deferral of responsibility, and organized neglect 
for data and statistics about those minoritized bodies who do not hold power. So too 
are examples of individuals and communities using strategies like Salguero's to fill in 
the gaps left by these missing datasets—in the United States as around the world. 64 If 
"quantification is representation," as data journalist Jonathan Stray asserts, then this 
offers one way to hold those in power accountable. Collecting counterdata demon¬ 
strates how data science can be enlisted on behalf of individuals and communities that 
need more power on their side. 65 

Data Science with Whose Interests and Goals? 

Far too often, the problem is not that data about minoritized groups are missing but the 
reverse: the databases and data systems of powerful institutions are built on the exces¬ 
sive surveillance of minoritized groups. This results in women, people of color, and 
poor people, among others, being overrepresented in the data that these systems are 
premised upon. In Automating Inequality, for example, Virginia Eubanks tells the story 
of the Allegheny County Office of Children, Youth, and Families in western Pennsyl¬ 
vania, which employs an algorithmic model to predict the risk of child abuse in any 
particular home. 66 The goal of the model is to remove children from potentially abusive 
households before it happens; this would appear to be a very worthy goal. As Eubanks 
shows, however, inequities result. For wealthier parents, who can more easily access 
private health care and mental health services, there is simply not that much data to 
pull into the model. For poor parents, who more often rely on public resources, the 
system scoops up records from child welfare services, drug and alcohol treatment pro¬ 
grams, mental health services, Medicaid histories, and more. Because there are far more 
data about poor parents, they are oversampled in the model, and so their children 
are overtargeted as being at risk for child abuse—a risk that results in children being 
removed from their families and homes. Eubanks argues that the model "confusejsj 
parenting while poor with poor parenting." 

This model, like many, was designed under two flawed assumptions: (1) that more 
data is always better and (2) that the data are a neutral input. In practice, however, 
the reality is quite different. The higher proportion of poor parents in the database, 
with more complete data profiles, the more likely the model will be to find fault with 
poor parents. And data are never neutral; they are always the biased output of unequal 
social, historical, and economic conditions: this is the matrix of domination once 
again. 67 Governments can and do use biased data to marshal the power of the matrix 
of domination in ways that amplify its effects on the least powerful in society. In this 
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case, the model becomes a way to administer and manage classism in the disciplinary 
domain—with the consequence that poor parents' attempts to access resources and 
improve their lives, when compiled as data, become the same data that remove their 
children from their care. 

So this raises our next who question: Whose goals are prioritized in data science (and 
whose are not)? In this case, the state of Pennsylvania prioritized its bureaucratic goal 
of efficiency, which is an oft-cited reason for coming up with a technical solution to 
a social and political dilemma. Viewed from the perspective of the state, there were 
simply not enough employees to handle all of the potential child abuse cases, so it 
needed a mechanism for efficiently deploying limited staff—or so the reasoning goes. 
This is what Eubanks has described as a scarcity bias: the idea that there are not enough 
resources for everyone so we should think small and allow technology to fill the gaps. 
Such thinking, and the technological "solutions" that result, often meet the goals 
of their creators—in this case, the Allegheny County Office of Children, Youth, and 
Families—but not the goals of the children and families that it purports to serve. 

Corporations also place their own goals ahead of those of the people their prod¬ 
ucts purport to serve, supported by their outsize wealth and the power that comes 
with it. For example, in 2012, the New York Times published an explosive article by 
Charles Duhigg, "How Companies Learn Your Secrets," 68 which soon became the stuff 
of legend in data and privacy circles. Duhigg describes how Andrew Pole, a data scien¬ 
tist working at Target, was approached by men from the marketing department who 
asked, "If we wanted to figure out if a customer is pregnant, even if she didn't want 
us to know, can you do that?" 69 He proceeded to synthesize customers' purchasing 
histories with the timeline of those purchases to give each customer a so-called preg¬ 
nancy prediction score (figure 1.6). 70 Evidently, pregnancy is the second major life 
event, after leaving for college, that determines whether a casual shopper will become 
a customer for life. 

Target turned around and put Pole's pregnancy detection model into action in an 
automated system that sent discount coupons to possibly pregnant customers. Win- 
win—or so the company thought, until a Minneapolis teenager's dad saw the coupons 
for baby clothes that she was getting in the mail and marched into his local Target to 
read the manager the riot act. Why was his daughter getting coupons for pregnant 
women when she was only a teen?! 

It turned out that the young woman was indeed pregnant. Pole's model informed 
Target before the teenager informed her family. By analyzing the purchase dates of 
approximately twenty-five common products, such as unscented lotion and large bags 
of cotton balls, the model found a set of purchase patterns that were highly correlated 
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Figure 1.6 

Screenshot from a video of statistician Andrew Pole's presentation at Predictive Analytics World 
about Target's pregnancy detection model in October 2010, titled "How Target Gets the Most out 
of Its Guest Data to Improve Marketing ROI." He discusses the model at 47:50. Image by Andrew 
Pole for Predictive Analytics World. 


with pregnancy status and expected due date. But the win-win quickly became a 
lose-lose, as Target lost the trust of its customers in a PR disaster and the Minneapolis 
teenager lost far worse: her control over information related to her own body and 
her health. 

This story has been told many times: first by Pole, the statistician; then by Duhigg, 
the New York Times journalist; then by many other commentators on personal privacy 
and corporate overreach. But it is not only a story about privacy: it is also a story about 
gender injustice—about how corporations approach data relating to women’s bodies 
and lives, and about how corporations approach data relating to minoritized popula¬ 
tions more generally. Whose goals are prioritized in this case? The corporation's, of 
course. For Target, the primary motivation was maximizing profit, and quarterly finan¬ 
cial reports to the board are the measurement of success. Whose goals are not priori¬ 
tized? The teenager's and those of every other pregnant woman out there. 

How did we get to the point where data science is used almost exclusively in the 
service of profit (for a few), surveillance (of the minoritized), and efficiency (amidst 
scarcity)? It's worth stepping back to make an observation about the organization of 
the data economy: data are expensive and resource-intensive, so only already powerful 
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institutions—corporations, governments, and elite research universities—have the 
means to work with them at scale. These resource requirements result in data science 
that serves the primary goals of the institutions themselves. We can think of these goals 
as the three Ss: science (universities), surveillance (governments), and selling (corpora¬ 
tions). This is not a normative judgment (e.g., "all science is bad") but rather an obser¬ 
vation about the organization of resources. If science, surveillance, and selling are the 
main goals that data are serving, because that's who has the money, then what other 
goals and purposes are going underserved? 

Let's take "the cloud" as an example. As server farms have taken the place of paper 
archives, storing data has come to require large physical spaces. A project by the Center 
for Land Use Interpretation (CLUI) makes this last point plain (figure 1.7). In 2014, 
CLUI set out to map and photograph data centers around the United States, often 
in those seemingly empty in-between areas we now call exurbs. In so doing, it called 
attention to "a new kind of physical information architecture" sprawling across the 
United States: "windowless boxes, often with distinct design features such as an appli¬ 
que of surface graphics or a functional brutalism, surrounded by cooling systems." The 
environmental impacts of the cloud—in the form of electricity and air conditioning— 
are enormous. A 2017 Greenpeace report estimated that the global IT sector, which 
is largely US-based, accounted for around 7 percent of the world's energy use. This is 
more than some of largest countries in the world, including Russia, Brazil, and Japan. 71 
Unless that energy comes from renewable sources (which the Greenpeace report shows 
that it does not), the cloud has a significant accelerating impact on global climate 
change. 

So the cloud is not light and it is not airy. And the cloud is not cheap. The cost of 
constructing Facebook's newest data center in Los Lunas, New Mexico, is expected to 
reach $1 billion. 72 The electrical cost of that center alone is estimated at $31 million 
per year. 73 These numbers return us to the question about financial resources: Who has 
the money to invest in centers like these? Only powerful corporations like Facebook 
and Target, along with wealthy governments and elite universities, have the resources 
to collect, store, maintain, analyze, and mobilize the largest amounts of data. Next, 
who is in charge of these well-resourced institutions? Disproportionately men, even 
more disproportionately white men, and even more than that, disproportionately rich 
white men. Want the data on that? Google's Board of Directors is comprised of 82 
percent white men. Facebook's board is 78 percent male and 89 percent white. The 
2018 US Congress was 79 percent male—actually a better percentage than in previous 
years—and with a median net worth of five times more than the average American 
household. 74 These are the people who experience the most privilege within the matrix 



Figure 1.7 

Photographs from Networked Nation: The Landscape of the Internet in America, an exhibition by the 
Center for Land Use Interpretation staged in 2013. The photos show four data centers located in 
North Bergen, NJ; Dalles, OR; Ashburn, VA; and Lockport, NY (counterclockwise from top right). 
They show how the "cloud" is housed in remote locations and office parks around the country. 
Images by the Center for Land Use Interpretation. 
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Figure 1.7 (continued) 
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of domination, and they are also the people who benefit the most from the current 
status quo. 75 

In the past decade or so, many of these men at the top have described data as "the 
new oil." 76 It's a metaphor that resonates uncannily well—even more than they likely 
intended. The idea of data as some sort of untapped natural resource clearly points to 
the potential of data for power and profit once they are processed and refined, but it 
also helps highlight the exploitative dimensions of extracting data from their source— 
people—as well as their ecological cost. Just as the original oil barons were able to use 
their riches to wield outsized power in the world (think of John D. Rockefeller, J. Paul 
Getty, or, more recently, the Koch brothers), so too do the Targets of the world use 
their corporate gain to consolidate control over their customers. But unlike crude oil, 
which is extracted from the earth and then sold to people, data are both extracted from 
people and sold back to them—in the form of coupons like the one the Minneapolis 
teen received in the mail, or far worse. 77 

This extractive system creates a profound asymmetry between who is collecting, 
storing, and analyzing data, and whose data are collected, stored, and analyzed. 78 The 
goals that drive this process are those of the corporations, governments, and well- 
resourced universities that are dominated by elite white men. And those goals are 
neither neutral nor democratic—in the sense of having undergone any kind of par¬ 
ticipatory, public process. On the contrary, focusing on those three Ss —science, surveil¬ 
lance, and selling—to the exclusion of other possible objectives results in significant 
oversights with life-altering consequences. Consider the Target example as the flip side 
of the missing data on maternal health outcomes. Put crudely, there is no profit to be 
made collecting data on the women who are dying in childbirth, but there is signifi¬ 
cant profit in knowing whether women are pregnant. 

How might we prioritize different goals and different people in data science? How 
might data scientists undertake a feminist analysis of power in order to tackle bias at its 
source? Kimberly Seals Allers, a birth justice advocate and author, is on a mission to do 
exactly that in relation to maternal and infant care in the United States. She followed 
the Serena Williams story with great interest and watched as Congress passed the Pre¬ 
venting Maternal Deaths Act of 2018. This bill funded the creation of maternal health 
review committees in every state and, for the first time, uniform and comprehensive 
data collection at the federal level. But even as more data have begun to be collected 
about maternal mortality, Seals Allers has remained frustrated by the public conversa¬ 
tion: "The statistics that are rightfully creating awareness around the Black maternal 
mortality crisis are also contributing to this gloom and doom deficit narrative. White 
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Figure 1.8 

Irth is a mobile app and web platform focused on removing bias from birth (including prenatal, 
birth, and postpartum health care). Users post intersectional reviews of the care they received 
from individual nurses and doctors, as well as whole practices and hospitals. When parents to 
be are searching for providers, they can consult Irth to see what kind of care people like them 
received in the hands of specific caregivers. Wireframes from Irth's first prototype are shown here. 
Images by Kimberly Seals Allers and the Irth team, 2019. 


people are like, 'how can we save Black women?' And that's not the solution that we 
need the data to produce." 79 

Seals Allers—and her fifteen-year-old son, Michael—are working on their own data- 
driven contribution to the maternal and infant health conversation: a platform and 
app called Irth—from birth, but with the b for bias removed (figure 1.8). One of the 
major contributing factors to poor birth outcomes, as well as maternal and infant mor¬ 
tality, is biased care. Hospitals, clinics, and caregivers routinely disregard Black wom¬ 
en's expressions of pain and wishes for treatment. 80 As we saw, Serena Williams’s own 
story almost ended in this way, despite the fact that she is an international tennis star. 
To combat this, Irth operates like an intersectional Yelp for birth experiences. Users 
post ratings and reviews of their prenatal, postpartum, and birth experiences at spe¬ 
cific hospitals and in the hands of specific caregivers. Their reviews include important 
























The Power Chapter 


47 


details like their race, religion, sexuality, and gender identity, as well as whether they 
felt that those identities were respected in the care that they received. The app also 
has a taxonomy of bias and asks users to tick boxes to indicate whether and how they 
may have experienced different types of bias. Irth allows parents who are seeking care 
to search for a review from someone like them—from a racial, ethnic, socioeconomic, 
and/or gender perspective—to see how they experienced a certain doctor or hospital. 

Seals Allers's vision is that Irth will be both a public information platform, for indi¬ 
viduals to find better care, and an accountability tool, to hold hospitals and providers 
responsible for systemic bias. Ultimately, she would like to present aggregated stories 
and data analyses from the platform to hospital networks to push for change grounded 
in women's and parents' lived experiences. "We keep telling the story of maternal mor¬ 
tality from the grave,” she says. "We have to start preventing those deaths by sharing 
the stories of people who actually lived." 81 

Irth illustrates the fact that "doing good with data" requires being deeply attuned to 
the things that fall outside the dataset—and in particular to how datasets, and the data 
science they enable, too often reflect the structures of power of the world they draw 
from. In a world defined by unequal power relations, which shape both social norms 
and laws about how data are used and how data science is applied, it remains impera¬ 
tive to consider who gets to do the "good" and who, conversely, gets someone else's 
"good” done to them. 

Examine Power 

Data feminism begins by examining how power operates in the world today. This con¬ 
sists of asking who questions about data science: Who does the work (and who is pushed 
out)? Who benefits (and who is neglected or harmed)? Whose priorities get turned into 
products (and whose are overlooked)? These questions are relevant at the level of indi¬ 
viduals and organizations, and are absolutely essential at the level of society. The cur¬ 
rent answer to most of these questions is "people from dominant groups,” which has 
resulted in a privilege hazard so acute that it explains the near-daily revelations about 
another sexist or racist data product or algorithm. The matrix of domination helps us to 
understand how the privilege hazard—the result of unequal distributions of power— 
plays out in different domains. Ultimately, the goal of examining power is not only to 
understand it, but also to be able to challenge and change it. In the next chapter, we 
explore several approaches for challenging power with data science. 



2 Collect, Analyze, Imagine, Teach 


Principle: Challenge Power 

Data feminism commits to challenging unequal power structures and working toward justice. 


In 1971, the Detroit Geographic Expedition and Institute (DGEI) released a provocative 
map, Where Commuters Run Over Black Children on the Pointes-Downtown Track. The map 
(figure 2.1) uses sharp black dots to illustrate the places in the community where the 
children were killed. On one single street corner, there were six Black children killed 
by white drivers over the course of six months. On the map, the dots blot out that 
entire block. 

The people who lived along the deadly route had long recognized the magnitude 
of the problem, as well as its profound impact on the lives of their friends and neigh¬ 
bors. But gathering data in support of this truth turned out to be a major challenge. 
No one was keeping detailed records of these deaths, nor was anyone making even 
more basic information about what had happened publicly available. "We couldn't 
get that information," explains Gwendolyn Warren, the Detroit-based organizer who 
headed the unlikely collaboration: an alliance between Black young adults from the 
surrounding neighborhoods and a group led by white male academic geographers from 
nearby universities. 1 Through the collaboration, the youth learned cutting-edge map¬ 
ping techniques and, guided by Warren, leveraged their local knowledge in order to 
produce a series of comprehensive reports, covering topics such as the social and eco¬ 
nomic inequities among neighborhood children and proposals for new, more racially 
equitable school district boundaries. 

Compare the DGEI map with another map of Detroit made thirty years earlier, 
Residential Security Map (figure 2.2). Both maps use straightforward cartographic tech¬ 
niques: an aerial view, legends and keys, and shading. But the similarities end there. 
The maps differ in terms of visual style, of course. But more profound is how they 
diverge in terms of the worldviews of their makers and the communities they seek to 
support. The latter map was made by the Detroit Board of Commerce, which consisted 
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Figure 2.1 

Where Commuters Run Over Black Children on the Pointes-Downtown Track (1971) is one image from 
a report, "Field Notes No. 3: The Geography of Children" which documented the racial inequities 
of Detroit children. The map was created by Gwendolyn Warren, the administrative director of 
the Detroit Geographic Expedition and Institute (DGEI), in a collaboration between Black young 
adults in Detroit and white academic geographers that lasted from 1968-1971. The group worked 
together to map aspects of the urban environment related to children and education. Warren also 
worked to set up a free school at which young adults could take college classes in geography for 
credit. Courtesy of Gwendolyn Warren and the Detroit Geographical Expedition and Institute. 


of only white men, in collaboration with the Federal Home Loan Bank Board, which 
consisted mostly of white men. Far from emancipatory, this map was one of the earliest 
instances of the practice of redlining, a term used to describe how banks rated the risk of 
granting loans to potential homeowners on the basis of neighborhood demographics 
(specifically race and ethnicity), rather than individual creditworthiness. 

Redlining gets its name because the practice first involved drawing literal red lines 
on a map. (Sometimes the areas were shaded red instead, as in the map in figure 2.2.) 
All of Detroit's Black neighborhoods fall into red areas on this map because housing 
discrimination and other forms of structural oppression predated the practice. 2 But 
denying home loans to the people who lived in these neighborhoods reinforced those 
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Residential Security Map, a redlining map of Detroit published in 1939. Created as a collaboration between the (all white and male) Detroit Cham¬ 
ber of Commerce and the (majority white and male) Federal Home Loan Bank Board, the red colors signify neighborhoods that these institutions 
deemed red neighborhoods, at "high risk" for bank loans. Courtesy of Robert K. Nelson, LaDale Winling, Richard Marciano, Nathan Connolly, et 
al., Mapping Inequality: Redlining in New Deal America. 
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existing inequalities and, as decades of research have shown, were directly responsible 
for making them worse. 3 

Early twentieth-century redlining maps had an aura very similar to the "big data" 
approaches of today. These high-tech, scalable "solutions" were deployed across the 
nation, and they were one method among many that worked to ensure that wealth 
remained attached to the racial category of whiteness. 4 At the same time that these 
maps were being made, the insurance industry, for example, was implementing simi¬ 
lar data-driven methods for granting (or denying) policies to customers based on 
their demographics. Zoning laws that were explicitly based on race had already been 
declared unconstitutional; but within neighborhoods, so-called covenants were nearly 
as exclusionary and completely legal. 5 This is a phenomenon that political philosopher 
Cedric Robinson famously termed racial capitalism, and it continues into the present in 
the form of algorithmically generated credit scores that are consistently biased and in 
the consolidation of "the 1 percent” through the tax code, to give only two examples 
of many. 6 What's more, the benefits of whiteness accrue: "Whiteness retains its value 
as a 'consolation prize,"' civil rights scholar Cheryl Harris explains. "It does not mean 
that all whites will win, but simply that they will not lose." 7 

Who makes maps and who gets mapped? The redlining map is one that secures the 
power of its makers: the white men on the Detroit Board of Commerce, their families, 
and their communities. This particular redlining map is even called Residential Security 
Map. But the title reflects more than a desire to secure property values. Rather, it reveals 
a broader desire to protect and preserve home ownership as a method of accumulating 
wealth, and therefore status and power, that was available to white people only. In far 
too many cases, data-driven "solutions" are still deployed in similar ways: in support 
of the interests of the people and institutions in positions of power, whose worldviews 
and value systems differ vastly from those of the communities whose data the systems 
rely upon. 8 

The DGEI map, by contrast, challenges this unequal distribution of data and power. 
It does so in three key ways. First, in the face of missing data, DGEI compiled its own 
counterdata. Warren describes how she developed relationships with "political people 
in order to use them as a means of getting information from the police department in 
order to find out exactly what time, where, how and who killed [each] child." 9 Second, 
the DGEI map plotted the data they collected with the deliberate aim of quantify¬ 
ing structural oppression. They intentionally and explicitly focused on the problems 
of "death, hunger, pain, sorrow and frustration in children," as they explain in the 
report. 10 Finally, the DGEI map was made by young Black people who lived in the 
community, under the leadership of a Black woman who was an organizer in the com¬ 
munity, with support provided by the academic geographers. 11 The identities of these 
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makers matter, their proximity to the subject matter matters, the terms of their collabo¬ 
ration matter, and the leadership of the project matters. 12 

For these reasons, the DGEI provides a model of the second principle of data femi¬ 
nism: challenge power. Challenging power requires mobilizing data science to push back 
against existing and unequal power structures and to work toward more just and equi¬ 
table futures. As we will discuss in this chapter, the goal of challenging power is closely 
linked to the act of examining power, the first principle of data feminism. In fact, the 
first step of challenging power is to examine that power. But the next step—and the 
reason we have chosen to dedicate two principles to the topic of power—is to take 
action against an unjust status quo. 

Taking action can itself take many forms, and in this chapter we offer four starting 
points: (1) Collect: Compiling counterdata—in the face of missing data or institutional 
neglect—offers a powerful starting point as we see in the example of the DGEI, or in 
Maria Salguero's femicide maps discussed in chapter 1. (2) Analyze: Challenging power 
often requires demonstrating inequitable outcomes across groups, and new computa¬ 
tional methods are being developed to audit opaque algorithms and hold institutions 
accountable. (3) Imagine: We cannot only focus on inequitable outcomes, because then 
we will never get to the root cause of injustice. In order to truly dismantle power, we 
have to imagine our end point not as "fairness," but as co-liberation. (4) Teach: The 
identities of data scientists matter, so how might we engage and empower newcom¬ 
ers to the field in order to shift the demographics and cultivate the next generation of 
data feminists? 

Analyze and Expose Oppression 

One can make a direct comparison between yesterday's redlining maps and today's risk 
assessment algorithms. The latter are used in many cities in the United States today to 
inform judgments about the length of a particular prison sentence, the amount of bail 
that should be set, and even whether bail should be set in the first place. The "risk" in 
their name has to do with the likelihood of a person detained by the police commit¬ 
ting a future crime. Risk assessment algorithms produce scores that influence whether 
a person is sent to jail or set free, effectively altering the course of their life. 

But risk assessment algorithms, like redlining maps, are neither neutral nor objec¬ 
tive. In 2016, Julia Angwin led a team at ProPublica to investigate one of the most 
widely used risk assessment algorithms in the United States, created by the company 
Northpointe (now Equivant). 13 Her team found that white defendants are more often 
mislabeled as low risk than Black defendants and, conversely, that Black defendants 
are mislabeled as high risk more often than white defendants. 14 Digging further into 
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the process, the journalists uncovered a 137-question worksheet that each detainee is 
required to fill out (figure 2.3). The detainee's answers feed into the software, in which 
they are compared with other data to determine that person's risk score. Although 
the questionnaire does not ask directly about race, it asks questions that, given the 
structural inequalities embedded in US culture, serve as proxies for race. These include 
questions like whether you were raised by a single mother, whether you have ever 
been suspended from school, or whether you have friends or family that have been 
arrested. In the United States, each of those questions is linked to a set of larger social, 
cultural, and political—and, more often than not, racial—realities. For instance, it has 
been demonstrated that 67 percent of Black kids grow up in single-parent households, 


The next few questions are about the family or caretakers that mainly raised you when growing up, 

31. Which of the following best describes who principally raised you? 

□ Both Natural Parents 

□ Natural Mother Only 

□ Natural Father Only 

□ Relative(s) 

□ Adoptive Parent(s) 

□ Foster Parents) 

0 Other arrangement 

32. If you lived with both parents and they later separated, how old were you at the time? 

0 Less than 5 Cl S to 10 □ 11 to 14 □ 15 or older □ Does Not Apply 

33. Was your father (or father figure who principally raised you) ever arrested, that you know of? 

0 No □ Yes 

34. Was your mother (or mother figure who principally raised you) ever arrested, that you know of? 

0 No D Yes 

35. Were your brothers or sisters ever arrested, that you know of? 

□ No0Yes 

36. Was your wife/husband/partner ever arrested, that you know of? 

0 No D Yes 

37. Did a parent or parent figure who raised you ever have a drug or alcohol problem? 

0NoL]Yes 

38. Was one of your parents (or parent figure who raised you) ever sent to jail or prison? 

0 No □ Yes 


Figure 2.3 

Equivant's risk assessment algorithm is called Correctional Offender Management Profiling for 
Alternative Sanctions (COMPAS) and is derived from a defendant's answers to a 137-question 
survey about their upbringing, personality, family, and friends, including many questions that 
can be considered proxies for race, such as whether they were raised by a single mother. Note 
that evidence of family criminality would not be admissible evidence in a court case for a crime 
committed by an individual, but here it is used as a factor in making important decisions about 
a person's freedom. Courtesy of Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner for 
ProPublica, 2016. 
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whereas only 25 percent of white kids do. 15 Similarly, studies have shown that Black 
kids are punished more harshly than are white kids for the same minor infractions, 
starting as early as preschool. 16 So, though the algorithm's creators claim that they do 
not consider race, race is embedded into the data they are choosing to employ. What's 
more, they are using that information to further disadvantage Black people, whether 
because of an erroneous belief in the objectivity of their data, or because they remain 
unmoved by the evidence of how racism is operating through their technology. 

Sociologist Ruha Benjamin has a term for these situations: the New Jim Code —where 
software code and a false sense of objectivity come together to contain and control 
the lives of Black people, and of other people of color. 17 In this regard, the redlin¬ 
ing map and the Equivant risk assessment algorithm share some additional similari¬ 
ties. Both use aggregated data about social groups to make decisions about individuals: 
Should we grant a loan to this person? What's the risk that this person will reoff¬ 
end? Furthermore, both use past data to predict future behavior—and to constrain 
it. In both cases, the past data in question (like segregated housing patterns or single 
parentage) are products of structurally unequal conditions. These unequal conditions 
are true across large social groups, and yet the technology uses those data as predic¬ 
tive elements that will influence one person's future. Surya Mattu, a former ProPublica 
reporter who worked on the story, makes this point directly: "Equivant didn't account 
for the fact that African Americans are more likely to be arrested by the police regard¬ 
less of whether they committed a crime or not. The system makes an assumption 
that if you have been arrested you are probably at higher risk." 18 This is one of the 
challenges of using data about people as an input into a system: the data are never 
"raw." Data are always the product of unequal social relations—relations affected by 
centuries of history. As computer scientist Ben Green states, "Although most people 
talk about machine learning's ability to predict the future, what it really does is pre¬ 
dict the past." 19 Effectively such "predictive” software reinforces existing demographic 
divisions, amplifying the social inequities that have limited certain groups for genera¬ 
tions. The danger of the New Jim Code is that these findings are actively promoted 
as objective, and they track individuals and groups through their lives and limit their 
future potential. 

But machine learning algorithms don't just predict the past; they also reflect current 
social inequities. A less well-known finding from the ProPublica investigation of Equiv¬ 
ant, for example, is that it also surfaced significantly different treatment of women by 
the algorithm. Due to a range of factors, women tend to recidivate—to commit new 
crimes—less than men do. That means the risk scale for women "is such that somebody 
with a high risk score that's a woman is generally about the level of a medium risk score 
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for a man. So, it's actually really shocking that judges are looking at these and thinking 
that high risk means the same thing for a man and a woman when it doesn't," explains 
lead reporter Julia Angwin. 20 

Angwin decided to focus the story on race in part because of the prior work of crimi¬ 
nologists such as Kristy Holtfreter, which had already highlighted some of these gender 
differentials. 21 But there was another factor at play in her editorial decision: workplace 
sexism faced by women reporters like Angwin herself. Angwin explains how she had 
always been wary of working on stories about women and gender because she wanted 
to avoid becoming pigeonholed as a reporter who only worked on stories about women 
and gender. But, she explains, "one of the things I woke up to during the #MeToo 
movement was how many decisions like that I had made over the years"—an internal¬ 
ized form of oppression that had discouraged her from covering those important issues. 
In early 2018, when we conducted this interview, Angwin was hiring for her own data 
journalism startup, the Markup, founded with a goal of using data-driven methods 
to investigate the differential harms and benefits of new technologies on society. She 
was encouraged to see how many job candidates of all genders were pitching stories 
on issues relating to gender inequality. "In the era of data and AI, the challenge is that 
accountability is hard to prove and hard to trace," she explains. "The challenge for 
journalism is to try to make as concrete as possible those linkages when we can so we 
can show the world what the harms are.” 

Angwin is pointing out a tricky issue that is unlikely to go away. The field of jour¬ 
nalism has long prided itself on "speaking truth to power." But today, the location of 
that power has shifted from people and corporations to the datasets and models that 
they create and employ. These datasets and models require new methods of interroga¬ 
tion, particularly when they—like Equivant's—are proprietary. How does one report 
on a black box, as these harmful algorithms are sometimes described? 22 Much like the 
situation encountered by Gwendolyn Warren when she looked into the data on the 
Detroit children's deaths, or like Maria Salguero when she started logging femicides in 
Mexico, ProPublica found no existing studies that examined whether the risk scores 
were racially biased, or existing datasets they could use to point them to answers. To 
write the risk assessment story, ProPublica had to assemble a dataset of their own. The 
researchers looked at ten thousand criminal defendants from a single county in Florida 
and compared their recidivism risk scores with people who actually reoffended in a 
two-year period. After doing some initial exploratory analysis, they created their own 
regression model that considered race, age, criminal history, future recidivism, charge 
degree, and gender. They found that age, race, and gender were the strongest predictors 
of who received a high risk score—with Black defendants 77 percent more likely than 
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white ones to receive a higher violent recidivism score. Their analysis also included 
creating models to test the overall accuracy of the COMPAS model over time and an 
investigation of errors to see if there were racial differences in the distribution of false 
positives and false negatives. As it turns out, there were: the system was more likely to 
predict that white people would not commit additional crimes if released, when they 
actually did recidivate. 23 

Angwin and her coauthors used data science to challenge data science. By collecting 
missing data and reverse-engineering the algorithm that was judging each defendant's 
risk, they were able to prove systemic racial bias. This analysis method is called audit¬ 
ing algorithms and it is being increasingly used in journalism and in academic research 
in order show how the harms and benefits of automated systems are differentially 
distributed. Computational journalism researcher Nicholas Diakopoulos has proposed 
that work like this become formalized into an algorithm accountability beat, which 
would help to make the practice more widespread. 24 Tie and computer scientist Sorelle 
Friedler have asserted that algorithms need to be held "publicly accountable" for their 
consequences, and the press is one place where this accounting can take place. 25 By 
providing proof of how racism and sexism, among other oppressions, create unequal 
outcomes across social groups, analyzing data is a powerful strategy for challenging 
power and working toward justice. 

The Pitfalls of Proof 

Let's pause here for a feminist who question, as we introduced in chapter 1. Who is it, 
exactly, that needs to be shown the harms of such differentials of power? And what 
kind of proof do they require to believe that oppression is real? Women who experi¬ 
ence instances of sexism, as Angwin did in her workplace, already know the harms of 
that oppressive behavior. The young adults whom Gwendolyn Warren worked with 
in Detroit already knew intimately that the white commuters were killing their Black 
neighbors and friends. They had no need to prove to their own communities that struc¬ 
tural racism was a factor in these deaths. Rather, their goal in partnering with the DGEI 
was to prove the structural nature of the problem to those in positions of power. Those 
dominant groups and institutions were the ones that, by privileging their own social, 
political, and economic interests, bore much of the responsibility for the problem; and 
they also, because of the phenomenon we have described as a privilege hazard, were 
unlikely to see that such problem existed in the first place. The theory of change that 
motivates these efforts to use data as evidence, or "proof," is that by being made aware 
of the extent of the problem, those in power will be prompted to take action. 
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These kinds of data-driven revelations can certainly be compelling. When the analy¬ 
sis appears in a high-profile newspaper or blog or TV show (in other words: a place 
white enough and male enough to be considered mainstream), it can indeed prompt 
people in power to act. The ProPublica story on risk assessment algorithms, for exam¬ 
ple, prompted a New York City council member to propose an algorithmic accountabil¬ 
ity bill. Enacted in 2018, the bill became the first legal measure to tackle algorithmic 
discrimination in the United States and led to the creation of a task force focused on 
"equity and fairness" in city algorithms. 26 Should the city implement some of the task 
force's recommendations, it would influence the work of software vendors, as well as 
legislation in other cities. This path of influence—from community problem to gather¬ 
ing proof to informed reporting to policy change—represents the best aspirations of 
speaking truth to power. 27 

While analyzing and exposing oppression in order to hold institutions account¬ 
able can be extremely useful, its efficacy comes with two caveats. Proof can just as 
easily become part of an endless loop if not accompanied by other tools of commu¬ 
nity engagement, political organizing, and protest. Any data-based evidence can be 
minimized because it is not "big" enough, not "clean" enough, or not "newsworthy" 
enough to justify a meaningful response from institutions that have a vested interest 
in maintaining the status quo. 28 As we saw in chapter 1, Maria Salguero's data on femi¬ 
cides was augmented by government commissions, reports from international agen¬ 
cies, and rulings of international courts. But none of those data-gathering efforts have 
been enough to prompt comprehensive action. 

Another feminist who question: On whom is the burden of proof is placed? In 2015, 
communications researcher Candice Lanius wrote a widely shared blog post, "Fact 
Check: Your Demand for Statistical Proof is Racist," in which she summarizes the ample 
research on how those in positions of power accept anecdotal evidence from those like 
themselves, but demand endless statistics from minoritized groups. 29 In those cases, 
she argues convincingly, more data will never be enough. 

Proof can also unwittingly compound the harmful narratives—whether sexist or rac¬ 
ist or ableist or otherwise oppressive—that are already circulating in the culture, inad¬ 
vertently contributing to what are known as deficit narratives. These narratives reduce 
a group or culture to its "problems," rather than portraying it with the strengths, 
creativity, and agency that people from those cultures possess. For example, in their 
book Indigetwus Statistics, Maggie Walter and Chris Anderson describe how statistics 
used by settler colonial groups to describe Indigenous populations have mainly func¬ 
tioned as "documentation of difference, deficit, and dysfunction." 30 This can occur 
even when the creators have good intentions—for example, as Kimberly Seals Allers 
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notes (see chapter 1), a great deal of the media reporting on Black maternal mortality 
data falls into the deficit narrative category. It portrays Black women as victims and 
fails to amplify the efforts of the Black women who have been working on the issue 
for decades. 

This goes for gender data as well. "What little data we collect about women tends 
to be either about their experience of violence or reproductive health," explains Nina 
Rabinovitch Blecker, who directs communications for Data2X, a nonprofit aimed at 
improving the quality of data related to gender in a global context. 31 The current data 
encourage additional deficit narratives—in which women are relentlessly and reduc- 
tively portrayed as victims of violent crimes like murder, rape, or intimate partner vio¬ 
lence. These narratives imply that the subjects of the data have no agency and need 
"saving" from governments, international institutions, or concerned citizens. As one 
step to counteract that, Blecker chose to publish an example from Uruguay that didn't 
focus on violence, but rather on quantifying women's unseen contributions to the 
economy. 

So, though collecting counterdata and analyzing data to provide proof of oppres¬ 
sion remain worthy goals, it is equally important to remain aware of how the subjects 
of oppression are portrayed. Working with communities directly, which we talk more 
about in chapter 5, is the surest remedy to these harms. Indigenous researcher Maggie 
Walter explains that ownership of the process is key in order to stop the propagation 
of deficit narratives: "We [Indigenous peoplej must have real power in how statistics 
about us are done—where, when and how." 33 Key too is a sustained attention to the 
ways in which communities themselves are already addressing the issues. These actions 
are often more creative, more effective, and more culturally grounded than the actions 
that any outside organization would take. 

Envision Equity, Imagine Co-liberation 

As the examples discussed thus far in this book clearly demonstrate, one of the most 
dangerous outcomes of the tools of data and data science being consolidated in the 
hands of dominant groups is that these groups are able to obscure their politics and 
their goals behind their technologies. Benjamin, whose book Race after Technology: Abo¬ 
litionist Tools for the New Jim Code (mentioned earlier), describes this phenomenon as 
the "imagined objectivity of data and technology" because data-driven systems like 
redlining and risk assessment algorithms are not really objective at all. 34 Her concept 
of imagined objectivity emphasizes the role that cultural assumptions and personal pre¬ 
conceptions play in upholding this false belief: one imagines (wrongly) that datasets 
and algorithms are less partial and less discriminatory than people and thus more 
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"objective." 35 But as we discuss in chapter 1, these data products seem objective only 
because the perspectives of those who produce them—elite, white men and the institu¬ 
tions they control—pass for the default. Assumptions about objectivity are becoming a 
major focus in data science and related fields as algorithm after algorithm is revealed to 
be sexist, racist, or otherwise flawed. What can the people who design these computa¬ 
tional systems do to avoid these pitfalls? And what can everyone else do to help them 
and hold them accountable? 

The quest for answers to these questions has prompted the development of a new 
area of research known as data ethics. It represents a growing interdisciplinary effort— 
both critical and computational—to ensure that the ethical issues brought about by 
our increasing reliance on data-driven systems are identified and addressed. Thus far, 
the major trend has been to emphasize the issue of "bias,” and the values of "fair¬ 
ness, accountability, and transparency" in mitigating its effects. 36 This is a promising 
development, especially for technical fields that have not historically foregrounded 
ethical issues, and as funding mechanisms for research on data and ethics proliferate. 37 
However, as Benjamin's concept of imagined objectivity helps to show, addressing bias 
in a dataset is a tiny technological Band-Aid for a much larger problem. Even the val¬ 
ues mentioned here, which seek to address instances of bias in data-driven systems, 
are themselves non-neutral, as they locate the source of the bias in individual people 
and specific design decisions. So how might we develop a practice that results in data- 
driven systems that challenge power at its source? 

The following chart (table 2.1) introduces an alternate set of orienting concepts for 
the field: these are the six ideals that we believe should guide data ethics work. These 


Table 2.1 

From data ethics to data justice 


Concepts That Secure Power 

Because they locate the source of the 
problem in individuals or technical systems 

Concepts That Challenge Power 

Because they acknowledge structural power 
differentials and work toward dismantling them 

Ethics 

Justice 

Bias 

Oppression 

Fairness 

Equity 

Accountability 

Co-liberation 

Transparency 

Reflexivity 

Understanding algorithms 

Understanding history, culture, and context 
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concepts all have legacies in intersectional feminist activism, collective organizing, and 
critical thought, and they are unabashedly explicit in how they work toward justice. 

In the left-hand column, we list some of the major concepts that are currently cir¬ 
culating in conversations about the uses of data and algorithms in public (and private) 
life. These are a step forward, but they do not go far enough. On the right-hand side, we 
list adjacent concepts that emerge from a grounding in intersectional feminist activism 
and critical thought. The gap between these two columns represents a fundamental dif¬ 
ference in view of why injustice arises and how it operates in the world. The concepts 
on the left are based on the assumption that injustice arises as a result of flawed indi¬ 
viduals or small groups ("bad apples," "racist cops," "brogrammers") or flawed techni¬ 
cal systems ("the algorithm/dataset did it"). Although flawed individuals and flawed 
systems certainly exist, they are not the root cause of the problems that occur again and 
again in data and algorithms. 

What is the root cause? If you've read chapter 1, you know the answer: the matrix 
of domination, the matrix of domination, and the matrix of domination. The concepts 
on the left may do good work, but they ultimately keep the roots of the problem in 
place. In other words, they maintain the current structure of power, even if they don't 
intend to, because they let the matrix of domination off the hook. They direct data 
scientists' attention toward seeking technological fixes. Sometimes those fixes are nec¬ 
essary and important. But as technology scholars Julia Powles and Helen Nissenbaum 
assert, "Bias is real, but it's also a captivating diversion." 38 There is a more fundamental 
problem that must also be addressed: we do not all arrive in the present with equal 
power or privilege. Hundreds of years of history and politics and culture have brought 
us to the present moment. This is a reality of our lives as well as our data. A broader 
focus on data justice, rather than data ethics alone, can help to ensure that past inequi¬ 
ties are not distilled into black-boxed algorithms that, like the redlining maps of the 
twentieth century, determine the course of people's lives in the twenty-first. 

In proposing this chart, we are not suggesting that ethics have no place in data 
science, that bias in datasets should not be addressed, or that issues of transparency 
should go ignored. 39 Rather, the main point is that the concepts on the left are inad¬ 
equate on their own to account for the root causes of structural oppression. By not tak¬ 
ing root causes into account, they limit the range of responses possible to challenge 
power and work toward justice. In contrast, the concepts on the right start from the 
basic feminist belief that oppression is real, historic, ongoing, and worth dismantling. 

Media theorist and designer Sasha Costanza-Chock proposes a restorative 
approach to data justice. 40 Drawing from theories of restorative justice—meaning 
that decisions should be made in ways that recognize and rectify any harms of the 
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past—Costanza-Chock asserts that any notion of algorithmic fairness must also acknowl¬ 
edge the systematic nature of the unfairness that has long been perpetrated by certain 
groups on others. They give the example of college admissions—a topic that always 
seems to be in the news, not the least because it's a major mechanism of protecting 
privilege. 41 A restorative approach to college admissions would entail making decisions 
about who gets admitted in the present on the basis of who was historically not admit¬ 
ted in the past—like women, who were excluded from MIT, where Constanza-Chock 
teaches, for decades. According to this model, a "fair" present-day entering class that 
accounts for history might be composed of 90 percent women and people of color. 42 

Does this approach make fairness political? Emphatically yes, because all systems are 
political. In fact, the appeal to avoid politics is a very familiar way for those in power 
to attempt to hold onto it. 43 The ability to sidestep politics is a privilege in itself—held 
only by those whose existence does not challenge the status quo. If you are a Black 
woman or a Muslim man or a transgender service member and you live in the United 
States today, your being in the world is political, whether or not you want it to be. 44 So 
rather than design algorithms that purport to be "color-blind" (since color-blindness 
is of course a myth), Costanza-Chock explains that we should be designing algorithms 
that are just. 4S This means shifting from the ahistorical notion of fairness to a model 
of equity. 

Equity is justice of a specific flavor, and it is different than equality. Equality is mea¬ 
sured from a starting point in the present: t = 0, where t equals time and 0 indicates that 
no time has elapsed since now. Based on this formula, the principle of equality would 
hold that resources and/or punishments should be doled out according to what is hap¬ 
pening in the present moment—the time when t = 0. But this formula for equal treat¬ 
ment means that those who are ahead in the present can go further, achieve more, and 
stay on top, whereas those who start out behind can never catch up. Kiddada Green, 
executive director of the Black Mothers Breastfeeding Association, makes the case that 
in a country where Black babies are dying at twice the rate of white babies, equality 
is actually systematically unfair: "There is a level of political correctness in America 
that causes some people to believe that equality is the way to go. Even when equality 
is unfair, some say that it's the right thing to do." 46 Working toward a world in which 
everyone is treated equitably, not equally, means taking into account these present 
power differentials and distributing (or redistributing) resources accordingly. Equity is 
much harder to model computationally than equality—as it needs to take time, history, 
and differential power into account—but it is not impossible. 47 

This difficulty also underscores the point that bias (in individuals, in datasets, in 
statistical models, or in algorithms) is not a strong enough concept in which to anchor 


Collect, Analyze, Imagine, Teach 


63 


ideas about equity and justice. In writing about the creation of New York's Welfare 
Management System in the early 1970s, for example, Virginia Eubanks describes: 
"These early big data systems were built on a specific understanding of what constitutes 
discrimination: personal bias." 48 The solution at the time was to remove the humans 
from the loop, and it remains so today: without potentially bad—in this case, racist— 
apples, there would be less discrimination. But this line of thinking illustrates what 
whiteness studies scholar Robin DiAngelo would call the new racism: the belief that 
racism is due to individual bad actors, rather than structures or systems. 49 In relation 
to welfare management, Eubanks emphasizes that this often meant replacing social 
workers, who were often women of color, and who had empathy and flexibility and 
listening skills, with an automated system that applies a set of rigid criteria, no matter 
what the circumstances. 

While bias remains a serious problem, it should not be viewed as something that 
can be fixed after the fact. Instead, we must look to understand and design systems 
that address the source of the bias: structural oppression. In truth, oppression is itself 
an outcome, one that results from the matrix of domination. In this model, majoritized 
bodies are granted undeserved advantages and minoritized bodies must survive unde¬ 
served hardships. Starting from the assumption that oppression is the problem, not 
bias, leads to fundamentally different decisions about what to work on, who to work 
with, and when to stand up and say that a problem cannot and should not be solved 
by data and technology. 50 Why should we settle for retroactive audits of potentially 
flawed systems if we can design with a goal of co-liberation from the start? 51 And here, 
co-liberation doesn't mean "free the data," but rather "free the people." The people in 
question are not only those with less privilege, but also those with more privilege: data 
scientists, designers, researchers, and educators—in other words, those like ourselves— 
who play a role in upholding oppressive systems. 

The key to co-liberation is that it requires a commitment to and a belief in mutual 
benefit, from members of both dominant groups and minoritized groups; that's the co 
in the term. Too often, acts of data service performed by tech companies are framed as 
charity work (we discuss the limits of "data for good" in chapter 5). The frame of co¬ 
liberation equalizes this exchange as a form of relationship building and demographic 
healing. There is a famous saying credited to aboriginal activists in Queensland, Aus¬ 
tralia, from the 1970s: "If you have come here to help me, you are wasting your time. 
But if you have come because your liberation is bound up with mine, then let us work 
together." 52 

What does this mean? As poet and community organizer Tawana Petty explains in 
relation to efforts around antiracism in the United States: "We need whites to firmly 
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believe that their liberation, their humanity, is also dependent upon the destruction of 
racism and the dismantling of white supremacy." 53 The same goes for gender: men are 
often not prompted to think about how unequal gender relations seep into the institu¬ 
tions they dominate, resulting in harm for everyone. 

This goal of co-liberation motivates the Our Data Bodies (ODB) project. Led by a 
group of five women, including Gangadharan and Petty, who sit at the intersection of 
academia and organizing work, this project is a community-centered initiative focused 
on data collection efforts that disproportionately impact minoritized people. Working 
with community organizations in three US cities, the ODB project has led participatory 
research initiatives and educational workshops, culminating in the recently released 
Digital Defense Playbook, a set of activities, tools, and tip sheets intended to be used by 
and for marginalized communities to understand how data-driven technologies impact 
their lives. 54 

Digital Defense Playbook was born out of many years of relationship-building and 
research, as well as a deliberate shift. The group explains in the playbook's introduc¬ 
tion, "We wanted to shift who gets to define problems around data collection, data 
privacy, and data security—from elites to impacted communities; shine a light on how 
communities have been confronting data-driven problems as well as how they wish to 
confront these problems; and forge an analysis of data and data-driven technologies 
from and with allied struggles." 55 In so doing, the ODB project demonstrates how co¬ 
liberation requires not only transparency of methods but also reflexivity: the ability to 
reflect on and take responsibility for one's own position within the multiple, intersect¬ 
ing dimensions of the matrix of domination. Along the way, the scholars and organiz¬ 
ers involved in the project decided to shift their research agenda, which had begun as 
a general project about data profiling and resistance, to surveillance, in response to the 
problems voiced by the communities themselves. 56 

Even within big tech itself, there is evidence of an increasing sense of reflexivity 
among employees for their role in creating harmful data systems. Employees have 
pushed back against Google's work with the Department of Defense (DoD) on Project 
Maven, which uses AI to improve drone strike accuracy; Microsoft's decision to take 
$480 million from the Department of Defense to develop military applications of its 
augmented reality headset HoloLens; and Amazon's contract with US Immigration and 
Customs Enforcement (ICE) to develop its Rekognition platform for use in targeting 
individuals for detention and deportation at US borders. 57 This pushback has led to the 
cancelling of the Google and Microsoft projects, as well as political consciousness rais¬ 
ing across the sector, which we discuss further in the book's conclusion. 58 

Designing datasets and data systems that dismantle oppression and work toward 
justice, equity, and co-liberation requires new tools in our collective toolbox. We have 
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some good starting points; building more understandable algorithms is a laudable, 
worthy research goal. And yet what we need to explain and account for are not only 
the inner workings of machine learning, but also the history, culture, and context 
that lead to discriminatory outputs in the first place. For example, it is not an isolated 
incident that facial analysis software couldn't "see" Joy Buolamwini's face, as we dis¬ 
cussed in chapter 1. It is not an isolated incident that the "Lena" image used to test 
most image-processing algorithms was the centerfold from the November 1972 issue 
of Playboy, cropped demurely at the shoulders. 59 It is not an isolated incident that the 
women who worked on the ENIAC computer were not invited to the fiftieth anniver¬ 
sary celebration in 1995. It is not an isolated incident that Christine Darden was not 
promoted as quickly as her male coworkers. None of these are isolated incidents: they 
are connected data points and eminently measurable and predictable outcomes of the 
matrix of domination. But you can only detect the pattern if you know the history, 
culture, and context that surrounds it. 

Data people, generally speaking, have choices—choices in who they work for, which 
projects they work on, and what values they reject. 60 Starting from the assumption 
that oppression is the problem, equity is the path, and co-liberation is the desired goal 
leads to fundamentally different projects that challenge power at their source. It also 
leads to different metrics of success. These extend beyond the efficiency of a database 
under load, the precision of a classification algorithm, or the size of a user base one 
year after launch. The success of a project designed with co-liberation in mind would 
also depend on how much trust was built between institutions and communities, how 
effectively those with power and resources shared their power and resources, how 
much learning happened in both directions, how much the people and organizations 
were transformed in the process, and how much inspiration for future work, together, 
was co-conspired. These metrics are a little more squishy than the numbers and rank¬ 
ings that we tend to believe are our only option, but utterly and entirely measurable 
nonetheless. 

Teach Data Like an Intersectional Feminist 

When Gwendolyn Warren and the DGEI researchers collected their data about hit and 
runs on Black children or scoured Detroit playgrounds to weigh and measure the bro¬ 
ken glass they found, they were not only doing this work to make a data-driven case 
for change. The "institute" part of the Detroit Geographic Expedition and Institute 
described the educational wing of the organization that ran classes in data collection, 
mapping, and cartography. It came about at Warren’s insistence that the academic geog¬ 
raphers give something back to the community whose knowledge they were drawing 
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upon for their research. She recognized that while a single map or project could make 
a focused intervention, education would enable her community to come away with a 
longer-term strategy for challenging power. As it turned out, the institutional affilia¬ 
tions of the academic geographers enabled them to offer free, for-credit college courses, 
which they taught in the community for community members. 

In her emphasis on education, Warren recognized its enduring role as a mechanism 
of both empowerment and transformation. This belief is not new; as American educa¬ 
tional reformer Horace Mann stated famously in 1848, "Education, then, beyond all 
other divides of human origin, is a great equalizer of conditions of men—the balance 
wheel of the social machinery." But here is the thing—it really matters how we do that 
equalizing and who we imagine that equalizing to serve. For his part, Mann was literal 
about the "men": education was to be an equalizer of men, but only certain men (read: 
white, Anglo, Christian) and explicitly not women. 61 Warren, on the other hand, rec¬ 
ognized that access to education—and to data science education in particular—would 
have to be expanded in order for it to achieve its equalizing force. 

Unfortunately, Warren's transformative vision has still yet to enter the data science 
classroom. As was true in Mann's era, men still lead. Women faculty comprise less than 
a third of computer science and statistics faculty. More than 80 percent of artificial 
intelligence professors are men. 62 This gender imbalance, and the narrowness of vision 
that results, is compounded by the fact that data science is often framed as an abstract 
and technical pursuit. Steps like cleaning and wrangling data are presented as solely 
technical conundrums; there is less discussion of the social context, ethics, values, or 
politics of data. 63 This perpetuates the myth that data science about astrophysics is the 
same as data science about criminal justice is the same as data science about carbon 
emissions. This limits the transformative work that can be done. Finally, because the 
goal of learning data science is modeled as individual mastery of technical concepts and 
skills, communities are not engaged and conversations are restricted. Instead, teach¬ 
ers impart technical knowledge via lectures, and students complete assignments and 
quizzes individually. We might call this model of teaching "the Horace Mann Factory 
Model of Data Science," because it represents the exclusionary view that Mann himself 
advanced. But let's just call it the Man Factory for short. 

The Man Factory is really good at producing men, mainly elite white men like the 
ones who already lead the classes. It's not as good at producing women data scientists, 
or nonbinary data scientists, or data scientists of color. For years, researchers and advo¬ 
cacy organizations have recognized that there are problems with this "pipeline" for 
technical fields; yet this research is framed around questions like "Why are there so 
few women computer scientists?" and "Why are women leaving computing?" 64 Note 
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that these questions imply that it is the women who have the problem, inadvertently 
perpetuating a deficit narrative. Feminist scholars who are studying the issue are, not 
surprisingly, asking very different questions, like "How can the men running the Man 
Factory share their power?" and "How can we structurally transform STEM education 
together?" 65 

One person currently modeling an answer to these questions is Laurie Rubel, the 
math educator behind the Local Lotto project. If you were on the city streets of Brook¬ 
lyn or the Bronx in the past five years, you may have inadvertently crossed paths 
with one of her data science classes. You probably didn't realize it because the classes 
looked nothing like a traditional classroom (figure 2.4). Teenagers from the neighbor¬ 
hood wandered around in small groups. They were outfitted with tablets, pen and 
paper, cameras, and maps. They periodically took pictures on the street, walked into 
bodegas, chatted with passersby in Spanish or English, and entered information on 
their tablets. 

Rubel is a leader in an area called mathematics for spatial justice, which aims to 
show how mathematical concepts can be taught in ways that relate to justice con¬ 
cerns arising from students' everyday lives, and to do so in dialogue with people in 
their neighborhoods and communities. The goal of Local Lotto was to develop a place- 
specific way of teaching concepts related to data and statistics grounded in consider¬ 
ations of equity. 66 Specifically, Rubel and the other organizers of Local Lotto wanted 
young learners to come up with a data-driven answer to the question: "Is the lottery 
good or bad for your neighborhood?" 

In New York, as in other US states that operate lotteries, lottery ticket sales go back 
into the state budget—sometimes, but not always, to fund educational programs. 67 But 
lottery tickets are not purchased equally across all income brackets or all neighbor¬ 
hoods. Low-wage workers buy more tickets than their higher-earning counterparts. 
What's more, the revenue from ticket purchases is not allocated back to those workers 
or the places they live. Because of this, scholars have argued that the lottery system is 
a form of regressive taxation—essentially a "poverty tax"—whereby low-income neigh¬ 
borhoods are "taxed" more because they play more, but do not receive a proportional 
share of the profit. 68 

The Local Lotto curriculum was designed to expose high school learners to this 
instance of social inequality. They begin by talking about the lottery and the idea of 
probability by playing chance-based games. Then they consider jackpot games like 
the Sweet Millions lottery, advertised by New York State as "your best chance from 
the New York Lottery to win a million for just a buck." The best chance to win one 
million, however, turns out to be about one in four million; an entire class session is 
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Figure 2.4 

A data science classroom. The Local Lotto project (2012-2015) taught local high school students 
statistics and data analysis rooted in neighborhood and justice concerns. Courtesy of the City 
Digits Project Team, including Brooklyn College, the Civic Data Design Lab at MIT and the Center 
for Urban Pedagogy. This work was supported by the National Science Foundation under Grant 
No. DRL-1222430. 

devoted to a discussion about other instances of "four million" that more closely relate 
to the learners' lives. 69 The learners then leave the classroom with the goal of collect¬ 
ing data about how other people experience the lottery, which takes them back into 
their neighborhoods. They map stores that sell lottery tickets. They record interviews 
with shopkeepers and ticket buyers on their tablets and then geolocate them on their 
maps. They take pictures of lottery advertising. Afterward, the learners analyze their 
results and present them to the class. They examine choropleth maps of income levels, 
they make ratio tables, and they correlate state spending of lottery profits with median 
family income. (No surprise: there is no correlation.) Finally, they create a data-driven 
argument: an opinion piece supported with evidence from their statistical and spatial 
analyses, as well as their fieldwork (figures 2.5 and 2.6). 












Collect, Analyze, Imagine, Teach 


69 




l» 4 


*Y0uhMcei4 

l \hcf io^r' 

CStcocVied 7 |«jj 

Os/fcVJull QnonCeS cT *10uT0f 81 

You Meed ro lose "7 Times -te>u>n it 

Gb.ES0.2S 


ime. 


Fvrt.4 


Second 


Third 


Fojr^ 


1 

iFTW 


5, (do He 


59,850 


(c6.fo30.25 


Tc*g| |515,157 

So£ha B^rre^ 
OShtey Tfcm/ 



515.151 

Tiectoce+Oi— 

8T1 

OKIE 00E 0E \ou< mcfet 

Eve^ g “H v>V«.V\-tou>rva 

NOU OOOOld V^tsodKvcX-W^, 
(JUin I VY** 1 1 



Figure 2.5 

"Now You Know." A student's infographic poster responding to the New York Lottery advertising 
message, "Hey, you never know ... ," that explores the probability of winning anything—ranging 
from a million dollars to another ticket. Courtesy of the City Digits Project Team. This work was 
supported by the National Science Foundation under Grant No. DRL-1222430. 
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Figure 2.6 

Final collaborative opinion piece asserting that the lottery is not good for the students' neighbor¬ 
hood and presenting evidence they collected to back up their case. See the multimedia slideshow at 
http://citydigits.mit.edU/locaIlotto#tours-tab. Courtesy of Emmanuela, Angel, Robert, andjaneva. 
This work was supported by the National Science Foundation under Grant No. DRL-1222430. 


By formal measures, the Local Lotto approach worked: before one school's imple¬ 
mentation of Local Lotto, only two of forty-seven learners were able to determine 
the correct number of possible combinations in a lottery example. Later, almost half 
(twenty-one of forty-seven) were successfully able to calculate the number of com¬ 
binations. But perhaps more importantly, the Local Lotto approach made math and 
statistics relevant to the students' lives. One student shared that what he learned was 
"something new that could help me in my local environment, in my house actually," 
and that after the course, he tried to convince his mother to spend less money on the 
lottery by "showing her my math book and all the work." Spanish-speaking women in 
the class who didn't often participate in classroom discussion became essential transla¬ 
tors during the participatory mapping module. Several students went on to teach other 
teachers about the curriculum, both locally and nationally. 70 

What's different about the Local Lotto approach to teaching data analysis and 
statistical concepts compared to the Man Factory? How is Local Lotto challeng¬ 
ing power both inside and outside the classroom? First, it was woman-led: the proj¬ 
ect was conceived by three women leaders representing three institutions. 71 Just as 
with the DGEI map and school, led by Gwendolyn Warren, the identities of the cre¬ 
ators matter. Second, rather than modeling data science as abstract and technical, 
Local Lotto modeled a data science that was grounded in solving ethical questions 
around social inequality that had relevance for learners' everyday lives: Is the lottery 
good or bad for your neighborhood? The project valued lived experience: the learn¬ 
ers came in as "domain experts" in their neighborhoods. And it valued both qualita¬ 
tive data and quantitative data: the learners spoke with neighborhood residents and 
connected their beliefs, attitudes, and concerns to probability calculations. Learners 
used community members' voices as evidence in their final projects. Third, rather than 






























Collect, Analyze, Imagine, Teach 


71 


valorizing individual mastery of technical skills as the gold standard, learners worked 
together during every phase of the project. They used methods from art and design 
(like the creation of infographics and digital slideshows) to practice communicating 
with data. 

Even as we celebrate these intentional pedagogical choices, the Local Lotto proj¬ 
ect still had its shortcomings, as the organizers noted in a 2016 paper for Cognition 
and Instruction. 72 Many of these stemmed from a basic fact: the teachers and course 
designers of the project were white and Asian, whereas the youth in the classes were 
predominantly Latinx and Black. This led to several issues. For instance, the curriculum 
designers had intended to focus primarily on income inequality, but they discovered 
that "the students consistently surfaced race." Because race and ethnicity were not 
part of the teaching material, the teachers felt that they did not have the experience 
or background to discuss them explicitly and deflected those conversations. As they 
write in the paper, "Youth, and in this case youth of color, have different understand¬ 
ings about racial boundaries; theirs are differently nuanced and scaled than affluent, 
white, or adult perspectives.” The organizers are now taking steps to explicitly integrate 
discussions about race into the curriculum, as well as to include race, ethnicity, and age 
data in the course projects. 73 

The course designers also encountered "limited but recurring instances of resistance 
from students" to the project's central focus on income inequality. They attribute this 
resistance to the fact that the course was developed and taught by outsiders and could 
be seen as passing judgment on the people in their neighborhoods: that because they 
were not from the community, the teachers were perpetuating a deficit narrative about 
low-income people. This is both a sophisticated and very fair pushback from the young 
learners. Most people, regardless of their wealth or level of education, know they are 
not going to win the lottery, after all. There is an element of imaginative fantasy in 
purchasing a ticket. The campaign slogan, "Hey, you never know ..." appeals as much 
to this fantasy as it does to the reality of the odds, and this fantasy has value too. 
In reflecting on the unintended sense of judgment experienced by the students, the 
course designers determined that, in the next iteration of the course, they would work 
to connect students with people in the communities themselves who are actively work¬ 
ing to address issues of income inequality. 

In both its successes and its failures, as well as its commitment to iteration and try¬ 
ing again, Local Lotto encapsulates what it means to challenge power and privilege and 
work toward justice. Justice is a journey. The discomfort that comes along with this 
journey is par for the course. There is no such thing as mastery of feminism because 
those who hold positions of privilege—like those in data science, like the Local Lotto 
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course designers, and like us, the authors of this book—are constantly learning how 
to be better allies and accomplices across difference. In this process, what becomes 
most important is to "stay with the trouble," as feminist philosopher Donna Haraway 
would say. 74 Staying with the trouble means persisting in your work, especially when 
it becomes uncomfortable, unclear, or outright upsetting. One of the biggest strengths 
of the Local Lotto project is the courage of its creators to publicly, transparently, and 
reflexively interrogate themselves and their process, to detail their stumbling blocks, 
and to describe their commitments to doing better in the future. 

Challenge Power 

After examining power, the next step is to challenge it—map by map, audit by audit, 
community by community, and classroom by classroom. Collecting counterdata to 
quantify and visualize structural oppression, as Gwendolyn Warren and the DGEI did 
with their map, helps those who occupy positions of power understand the scope, 
scale, and character of the problems from which they are otherwise far removed. 
Analyzing biased algorithms, as Julia Angwin and ProPublica did, can show the real, 
material harms of automated systems, as well as build a base of evidence for politi¬ 
cal or institutional change. At the same time, it is important to remember that 
minoritized individuals and groups should not have to repeatedly prove that their 
experiences of oppression are real. And data alone do not always lead to change— 
especially when that change also requires dominant groups to share their resources and 
their power. 

Those of us who use data in our work must alter some of our most basic assump¬ 
tions and imagine new starting points. Shifting the frame from concepts that secure 
power, like fairness and accountability, to those that challenge power, like equity and 
co-liberation, can help to ensure that data scientists, designers, and researchers take 
oppression and inequality as their grounding assumption for creating computational 
products and systems. We must learn from—and design with—the communities we 
seek to support. A commitment to data justice begins with an acknowledgment of the 
fact that oppression is real, historic, ongoing, and worth dismantling. This commit¬ 
ment is one that we must teach the next generation of data scientists and data citizens, 
in communities and in classrooms, if we want to broaden our path toward justice. 


3 On Rational, Scientific, Objective Viewpoints from Mythical, 
Imaginary, Impossible Standpoints 

Principle: Elevate Emotion and Embodiment 

Data feminism teaches us to value multiple forms of knowledge, including the knowledge that 
comes from people as living, feeling bodies in the world. 


In 2012, twenty kindergarten children and six adults were shot and killed at an elemen¬ 
tary school in Sandy Hook, Connecticut. In the wake of this unconscionable tragedy, 
and of the additional acts of gun violence that followed, the design firm Periscopic 
began a new project: to visualize all the gun deaths that took place in the United States 
over the course of a calendar year. Although there is no shortage of prior work on the 
subject in the form of bar charts or line graphs, Periscopic, a company with the tagline 
"Do good with data," took a different approach. 

When you load the project's webpage, you first see a single orange line that arcs up 
from the x-axis on the left-hand side of the screen. Then, the color abruptly changes to 
white. A small dot drops down, and you see the phrase, "Alexander Lipkins, killed at 
29" (figure 3.1a). The line continues to arc up across the screen and then down, com¬ 
ing back to rest on the x-axis, where a second phrase appears: "Could have lived to be 
93." Then, a second line appears—the arc of another life. The animation speeds up and 
the arcs multiply. A counter at the top right displays how many years of life have been 
"stolen" from these victims of gun violence. After several excruciating minutes, the 
visualization completes its count for the year: 11,419 people killed, totaling 502,025 
stolen years (figure 3.1b). 

The visualization uses demographic data and rigorous statistical methods to arrive 
at these numbers, as is explained in the methods section on the site. But what makes 
Periscopic's visualization so very different from a more conventional bar chart of simi¬ 
lar information, such as "The Era of 'Active Shooters'" from the Washington Post (figure 
3.2)? The projects share the proposition that gun deaths present a serious threat. But 
unlike the Washington Post bar chart, Periscopic's work is framed around an emotion: 
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Figure 3.1 

An animated visualization of the "stolen years" of people killed by guns in the United States in 
2013. The first image (a) shows the beginning state of the animation and the second image (b) 
shows the end state. Images by Perlscopic. 
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Figure 3.2 

A bar chart of the number of "active shooter" incidents in the United States between 2000 and 
2015. Images by Christopher Ingraham for the Washington Post. 


loss. People are dying; their remaining time on earth has been stolen from them. These 
people have names and ages. They have parents and partners and children who suffer 
from that loss as well. 

This message was clearly received, as was the project overall. It was featured in Wired 
magazine, and even won an Information is Beautiful award. But it also caused some 
stewing on the part of the visualization community. Alberto Cairo, the author of the 
visualization book The Truthful Art, expressed his concerns about the use of emotion 
and persuasion in the project: "Is it clear to a general audience that what they see is the 
work of professionals who actively shape data to support a cause, and not the product 
of automated processes?" 1 At root for Cairo was the question of how detached and 
"neutral" a visualization should be. He wondered, Should a visualization be designed 
to evoke emotion? 

The received wisdom in technical communication circles is, emphatically, "No." 
In the recent book A Unified Theory of Information Design, authors Nicole Amare and 
Alan Manning state: "The plain style normally recommended for technical visuals is 
directed toward a deliberately neutral emotional field, a blank page in effect, upon 
which viewers are more free to choose their own response to the information." 2 Here, 
plainness is equated with the absence of design and thus greater freedom on the part 
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of the viewer to interpret the results for themselves. Things like colors and icons work 
only to stir up emotions and cloud the viewer's rational mind. 

They're not the first ones to posit this belief. In the field of data communication, 
any kind of ornament has long been viewed as suspect. Why? As historian of science 
Theodore Porter puts it, "Quantification is a technology of distance." 3 And distance, 
he explains, is closely related to objectivity because it puts literal space between peo¬ 
ple and the knowledge they produce. This desire for separation is what underlies the 
nineteenth-century statistician Karl Pearson's exhortation, echoed in Cairo's comments 
about the Periscopic visualization, for people to set aside their "own feelings and emo¬ 
tions" when performing statistical work. 4 The more plain, the more neutral; the more 
neutral, the more objective; and the more objective, the more true—or so this line of 
reasoning goes. At a data visualization master class in 2013, workshop leaders from the 
Guardian newspaper held up spreadsheet data—spreadsheet data!—as an ideal for the 
communication of quantitative data, calling it: "Clarity without persuasion." 5 

But persuasion is everywhere, even in spreadsheets, and—as feminist philosopher 
Donna Haraway would likely argue—especially in spreadsheets. In the 1980s, Haraway 
was among the first to connect the seeming neutrality and objectivity of data and their 
visual display to the ideas about distance that we've just discussed. She described data 
visualization, in particular, as "the god trick of seeing everything from nowhere." The 
view from nowhere—from a distance, from up above, like a god—may be data visual¬ 
ization's most signature feature. It's also the most ethically complicated to navigate for 
the ways in which it masks the people, the methods, the questions, and the messiness 
that lies behind clean lines and geometric shapes. Haraway calls it the god trick: it's a 
trick because it makes the viewer believe that they can see everything, all at once, from 
an imaginary and impossible standpoint. But it's also a trick because what appears 
to be everything, and what appears to be neutral, is always what she terms a partial 
perspective. And in most cases of seemingly "neutral" visualizations, this perspective 
is the one of the dominant, default group. Think back to the presumption of white¬ 
ness as default that we discussed in the introduction, or—for an example of an actual 
visualization—to the redlining map discussed in chapter 2. This is a good example of 
the god trick at work. 6 

The god trick and its underlying assumptions about neutrality and truth are baked 
into today's best practices for data visualization. This is largely due to the influence of 
one man: the renowned statistical graphics expert Edward Tufte. Back in the 1980s, 
Tufte invented a metric for measuring the amount of superfluous information included 
in a chart. He called it the data-ink ratio. 7 In his view, a visualization designer should 
strive to use ink to display data alone. Any ink devoted to something other than 
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the data themselves—such as background color, iconography, or embellishment—is a 
suspect and intruder to the graphic. Visual minimalism, according to this logic, appeals 
to reason first. As police officer Joe Friday says to every woman character on the Ameri¬ 
can TV series Dragnet, "Just the facts, ma'am." Decorative elements, on the other hand, 
are associated with messy feelings—or, worse, represent stealthy (and, according to 
Tufte, unscientific) attempts at emotional persuasion. Data visualization has even been 
named as "the unempathetic art" by designer Mushon Zer-Aviv because of its emphatic 
rejection of emotion. 8 

The logic that sets up this false binary between emotion and reason is gendered, of 
course, because the belief that women are more emotional than men (and, by contrast, 
that men are more reasoned than women) is one of the most persistent stereotypes 
across many Western cultures. Indeed, psychologists have called it a master stereotype 
and puzzled over how it endures even when certain emotions—even extreme ones, like 
anger and pride—are simultaneously associated with men. 9 A central focus of feminist 
scholarship has been to challenge false binaries like this one between reason and emo¬ 
tion and to point out how they establish hierarchies as well. (We discuss this more in 
chapter 4.) For now, the important thing to note is how false binaries work to benefit a 
single one of Haraway's partial perspectives: that of the group already at the top—elite 
white men. 

How can we let go of this binary logic? Two additional questions help challenge 
this reductive way of thinking and the oppressive hierarchies that it supports. First, is 
visual minimalism really more neutral? And second, how might activating emotion— 
leveraging, rather than resisting, emotion in data visualization—help us learn, remem¬ 
ber, and communicate with data? Exploring these questions helps get us closer to the 
third principle of data feminism: embrace emotion and embodiment. 

Visualization as Rhetoric 

Information visualization has diverse origins. Its history is often traced from the explo¬ 
sion of European men mapping their colonial conquests in the late fifteenth and early 
sixteenth centuries, through the development of new visual typologies like the timeline 
and the bar chart in the seventeenth and eighteenth centuries, to the adoption of those 
forms by powerful nations as they amassed increasing amounts of data on the popula¬ 
tions they sought to control. But feminist scholars are increasingly challenging this 
simple narrative of progress, as well as its cast of characters, which is predominantly 
white and male. Whitney Battle-Baptiste and Britt Rusert recently published a new 
edition of the visualization work of W. E. B. Du Bois, the renowned Black sociologist 
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and civil rights activist, who created his "data portraits" of African American life for 
the 1900 Paris Exposition. Laura Bliss, in a blog post that went viral, called attention to 
the "narrative maps" of Shanawdithit, a member of the Beothuk (Newfoundland) tribe, 
which she created around 1829 at the urging of a visiting anthropologist. And Lauren, 
one of the authors of this book, created a website that reanimates the historical charts 
of Elizabeth Palmer Peabody, the nineteenth-century editor and educator, who used 
visualization in her teaching (figure 3.3). 10 

Each of these early visualization designers understood how their images could func¬ 
tion rhetorically. But in more recent history, many of data visualization's theorists and 
practitioners have come from technical disciplines aligned with engineering and com¬ 
puter science and have not been trained in that most fundamental of Western commu¬ 
nication theories. In his ancient Greek treatise, Aristotle defines rhetoric as "the faculty 
of observing in any given case the available means of persuasion." 11 But rhetoric isn't 
only found in political speeches made by men dressed in tunics with wreaths on their 
heads. 12 Any communicating object that reflects choices about the selection and rep¬ 
resentation of reality is a rhetorical object. Whether or not it is rhetorical (it always is) 
has nothing to do with whether or not it is true (it may or may not be). 

The question of rhetoric matters because "a rhetorical dimension is present in every 
design," says visualization researcher Jessica Hullman. 13 This includes visualizations 
that do not deliberately intend to persuade people of a certain message. It especially and 
definitively includes those so-called neutral visualizations that do not appear to have an 
editorial hand. In fact, those might even be the most perniciously persuasive visualiza¬ 
tions of all! 

Editorial choices become most apparent when compared with alternative choices. 
For example, in his book The Curious Journalist's Guide to Data, journalist Jonathan Stray 
discusses a data story from the New York Times about the September 2012 jobs report. 14 
The New York Times created two graphics from the report: one framed from the perspec¬ 
tive of Democrats (the party in power at the time; figure 3.4a) and one framed from the 
perspective of Republicans (figure 3.4b). 

Either of these graphics, considered in isolation, appears to be neutral and factual. 
The data are presented with standard methods (line chart and area chart respectively) 
and conventional positionings (time on the x-axis, rates expressed as percentages on 
the y-axis, title placed above the graphic). There is a high data-ink ratio in both cases 
and very little in the way of ornamentation. But the graphics have significant editorial 
differences. The Democrats' graphic emphasizes that unemployment is decreasing—in 
its title, the addition of the thick blue arrow pointing downward, and the annotation 
"Friday's drop was larger than expected." Whereas the Republicans' graphic highlights 
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Figure 3.3 

(a) Elizabeth Palmer Peabody's chart of "Significant Events of the 17th Century United States” (1865). 

(b) Peabody's chart recreated in digital form by Lauren's Digital Humanities Lab (2017). (c) A rendering 
of Peabody's chart reimagined as an interactive quilt by Lauren's Digital Humanities Lab (2019). Images 
by (a) Elizabeth Palmer Peabody, A Chronological History of the United States (1856), (b) the Georgia Tech 
Digital Humanities Lab, and (c) Courtney Allen for the Georgia Tech Digital Humanities Lab. 
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Figure 3.3 (continued) 


the fact that unemployment has been steadily high for the past three years—through 
the use of the "8 percent unemployment" reference line, the choice to use an area chart 
instead of a line, and, of course, the title of the graphic. 15 So neither graphic is neutral, 
but both graphics are factual. As Jonathan Stray says, "The constraints of truth leave a 
very wide space for interpretation." 16 When visualizing data, the only certifiable fact is 
that it's impossible to avoid interpretation (unless you simply republish the September 
jobs report as your visualization, but then it wouldn't be a visualization). 

Fields very close to visualization, like cartography, have long seen their work as 
ideological. But discussions of rhetoric, editorial choices, and power have been far less 
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The rate has fallen more than 2 points since its recent peak. 

Unemployment rate Friday's number 
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The rate was above 8 percent for 43 months. 
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Figure 3.4 

A data visualization of the September 2012 jobs report from the perspective of Democrats (a) and 
Republicans (b). The New York Times data team shows how simple editorial changes lead to large 
differences in framing and interpretation. As data journalist Jonathan Stray remarks on these 
graphics, "The constraints of truth leave a very wide space for interpretation." Images by Mike 
Bostock, Shan Carter, Amanda Cox, and Kevin Quealy, for the New York Times, as cited in The 
Curious Journalist's Guide to Data by Jonathan Stray. 
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frequent in the field of data visualization. In 2011, Hullman and coauthor Nicholas 
Diakopoulos wrote an influential paper reasserting the importance of rhetoric for 
the data visualization community. 17 Their main argument was that visualizing data 
involves editorial choices: some things are necessarily highlighted, while others are 
necessarily obscured. When designers make these choices, they carry along with them 
framing effects, which is to say they have an impact on how people interpret the graph¬ 
ics and what they take away from them. 

For example, it is standard practice to cite the source of one's data. This functions on 
a practical level—so that readers may go out and download the data themselves. But 
this choice also functions as what Hullman and Diakopoulos call provenance rhetoric 
designed to signal the transparency and trustworthiness of the presentation source to 
end users. Establishing trust between the designers and their audience in turn increases 
the likelihood that viewers will believe what they see. 

Other aspects of data visualization also work to displace viewers' attention from 
editorial choices to reinforce a graphic's perceived neutrality and "truthiness." After 
doing a sociological analysis, Helen Kennedy and coauthors determined that four 
conventions of data visualization reinforce people's perceptions of its factual basis: (1) 
two-dimensional viewpoints, (2) clean layouts, (3) geometric shapes and lines, and 
(4) the inclusion of data sources at the bottom. 18 These conventions contribute to the 
perception of data visualization as objective, scientific, and neutral. Both unemploy¬ 
ment graphics from the New York Times employ these conventions: the image space 
is two-dimensional and abstract; the layout is "clean," meaning minimal and lacking 
embellishment beyond what is necessary to communicate the data; the lines represent¬ 
ing employment rates vary smoothly and faithfully against a geometrically gridded 
background; and the source of the data is noted at the bottom. Either the Democrat or 
the Republican graphic would have been entirely plausible as a New York Times visu¬ 
alization, and very few of us would have thought to question the graphic's framing of 
the data. 

So if plain, "unemotional" visualizations are not neutral, but are actually extremely 
persuasive, then what does this mean for the concept of neutrality in general? Scien¬ 
tists and journalists are just some of the people who get nervous and defensive when 
questions about neutrality and objectivity come up. Auditors and accountants get ner¬ 
vous, too. They often assume that the only alternative to objectivity is a retreat into 
complete relativism and a world in which alternative facts reign and everyone gets a 
gold medal for having an opinion. But there are other options. 

Rather than valorizing the neutrality ideal and trying to expunge all human traces 
from a data product because of their bias, feminist philosophers have proposed a goal 
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of more complete knowledge. Donna Haraway's idea of the god trick comes from a 
larger argument about the importance of developing feminist objectivity. It's not just 
data visualization but all forms of knowledge that are situated, she explains, meaning 
that they are produced by specific people in specific circumstances—cultural, historical, 
and geographic. 19 Feminist objectivity is a tool that can account for the situated nature 
of knowledge and can bring together multiple—what she terms partial —perspectives. 
Sandra Harding, who developed her ideas alongside Haraway, proposes a concept of 
strong objectivity. This form of objectivity works toward more inclusive knowledge pro¬ 
duction by centering the perspectives—or standpoints —of groups that are otherwise 
excluded from knowledge-making processes. 20 This has come to be known as standpoint 
theory. To supplement these ideas, Linda Alcoff has introduced the idea of position¬ 
ality, a concept that emphasizes how individuals come to knowledge-making proc¬ 
esses from multiple positions, each determined by culture and context. 21 All of these 
ideas offer alternatives to the quest for a universal objectivity—which is, of course, an 
unattainable goal. 

The belief that universal objectivity should be our goal is harmful because it's always 
only partially put into practice. This flawed belief is what provoked renowned cardiolo¬ 
gist Dr. Nieca Goldberg to title her book Women Are Not Small Men because she found 
that heart disease in women unfolds in a fundamentally different way than in men. 22 
The vast majority of scientific studies—not just of heart disease, but of most medical 
conditions—are conducted on men, with women viewed as varying from this "norm" 
only by their smaller size. 23 The key to fixing this problem is to acknowledge that all 
science, and indeed all work in the world, is undertaken by individuals. Each person 
occupies a particular perspective, as Haraway might say; a particular standpoint, as 
Harding might say; or a particular set of positionalities, as Alcott might say. And all 
would agree that research by only men and about only men cannot be universalized to 
make knowledge claims about all other people in the world. 

Disclosing your subject position(s) is an important feminist strategy for being trans¬ 
parent about the limits of your—or anyone's—knowledge claims. Thus, for example, 
we (the authors) included statements about our own positionalities in the introduction 
in order to disclose the gender, race/ethnicity, class, ability, education, and other sub¬ 
ject positions that informed the writing of this book. Rather than viewing these posi¬ 
tionalities as threats or as influences that might have biased our work, we embraced 
them as offering a set of valuable perspectives that could frame our work. This is an 
approach that we would like to see others embrace as well. Each person's intersecting 
subject positions are unique, and when applied to data science, they can generate cre¬ 
ative and wholly new research questions. 


84 


Chapter 3 


Data Visceralization 

This embrace of multiple perspectives and positionalities helps to rebalance the hierar¬ 
chy of reason over emotion in data visualization. 24 How? Since the early 2000s, there 
has been an explosion of research about affect —the term that academics use to refer to 
emotions and other subjective feelings—from fields as diverse as neuroscience, geog¬ 
raphy, and philosophy. (We discuss affect further in chapter 7.) This work challenges 
the thinking that casts emotion out as irrational and illegitimate, even as it undeniably 
influences the social, political, and scientific processes of the world. Evelyn Fox Keller, 
a physicist turned philosopher, famously employed the Nobel Prize-winning research 
of geneticist Barbara McClintock to show how even the most profound of scientific 
discoveries are generated from a combination of experiment and insight, reason and 
emotion. 25 

Once we embrace the idea of leveraging emotion in data visualization, we can truly 
appreciate what sets Periscopic's "US Gun Deaths" graphic apart from the Washington 
Post graphic or from any number of other gun death charts that have appeared in 
newspapers and policy documents. The Washington Post graphic, for example, repre¬ 
sents death counts as blue ticks on a generic bar chart. If we didn't read the caption, we 
wouldn't know whether we were counting gun deaths in the United States or haystacks 
in Kansas or exports from Malaysia or any other statistic. In contrast, the Periscopic 
visualization leads with loss, grief, and mourning—primarily through its rhetorical 
emphasis on counting "stolen years." This draws the attention of viewers to "what 
could have been." The counting is reinforced by the visual language for representing 
the "stolen years” as grey lines, appropriate for numbers that are rigorously determined 
but not technically facts because they come from a statistical model. The visualization 
also uses animation and pacing to help us first appreciate the scale of one life, and 
then compound that scale 11,419-fold. The magnitude of the loss, especially when 
viewed in aggregate and over time, makes a statement of profound truth revealed to us 
through our own emotions. It is important to note that emotion and visual minimal¬ 
ism are not incompatible here; the Periscopic visualization shows us how emotion can 
be leveraged alongside visual minimalism for maximal effect. 

Skilled data artists and designers know these things already, or at least intuit them. 
Like the Periscopic team, others are pushing the boundaries of what affective and 
embodied data visualization could look like. In 2010, Kelly Dobson founded the Data 
Visceralization research group at the Rhode Island School of Design (RISD) Digital + 
Media graduate program. The goal for this group was not to visualize data but to vis- 
ceralize it. Visual things are for the eyes, but visceralizations are representations of data 
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that the whole body can experience, emotionally as well as physically—data that "we 
see, hear, feel, breathe and even ingest," writes media theorist Luke Stark. 26 

The reasons for visceralizing data have to do with more than simply creative experi¬ 
mentation. First, humans are not two eyeballs attached by stalks to a brain computer. 
We are embodied, multisensory beings with cultures and memories and appetites. 27 
Second, people with visual disabilities need a way to access the data encoded in charts 
and dashboards as well. According to the World Health Organization, 253 million peo¬ 
ple globally live with some form of visual impairment, on the spectrum from limited 
vision to complete blindness. 28 For reasons of accessibility, Aimi Hamraie, the director 
of the Mapping Access project at Vanderbilt University, advocates for a form of data vis- 
ceralization, although not in those exact terms: "Rather than relying entirely on visual 
representations of data," they explain, "digital-accessibility apps could expand access 
by incorporating 'deep mapping,' or collecting and surfacing information in multiple 
sensory formats." 29 

At the moment, however, examples of objects and events that make use of multiple 
sensory formats are more likely to be found in the context of research labs and gal¬ 
leries and museums. For example, in A Sort of Joy (Thousands of Exhausted Things), the 
theater troupe Elevator Repair Service joined forces with the data visualization firm 
the Office of Creative Research to script a live performance based on metadata about 
the artworks held by New York's Museum of Modern Art (MoMA). 30 With 123,951 
works in its collection, MoMA's metadata consists of the names of artists, the titles 
of artworks, their media formats, and their time periods. But how does an artwork 
make it into the museum collection to begin with? Major art museums and their col¬ 
lection policies have long been the focus of feminist critique because the question of 
whose work gets collected translates into the question of whose work is counted in the 
annals of history. 31 As you might guess, this history has mostly consisted of a parade 
of white male European "masters," as the Guerrilla Girls project pictured in figure 3.5 
reminds us. 

In 1989, the Guerrilla Girls, an anonymous collective of women artists, published 
an infographic: Do Women Have to Be Naked to Get into the Met. Museum? The graphic 
makes a data-driven argument by comparing the gender statistics of artists collected by 
another New York museum, the Metropolitan Museum of Art (the Met) to the gender 
statistics of the subjects and models in the artworks. It was designed to be displayed on 
a billboard, but it was rejected by the sign company because it "wasn't clear enough.” 32 
If you ask us, it's pretty clear: the Met readily collects paintings in which women 
are the (naked) subjects but it collects very few artworks created by women artists 
themselves. 


86 


Chapter 3 


f * Do women have to bo naked to 
I get into the Met. Museum? 


■ — »l__ ro/ _i ar f| S f S j„ the Modern 
ions are women, but 85% 
of the nudes are female 



Stofttks fiom ft* MetropoBton MiMum of Art, Nfw Yak City. 1989 

Guerrilla Girls (ONSCIEIKC or THE art world 


Figure 3.5 

Do Women Have to Be Naked to Get into the Met. Museum? An infographic (of a sort) created by the 
Guerrilla Girls in 1989, intended to be displayed on a bus billboard. Courtesy of the Guerrilla 
Girls. 

After being thwarted by the sign company, the Guerrilla Girls then paid for the info- 
graphic to be printed on posters displayed throughout the New York City bus system, 
until the Metropolitan Transportation Authority (MTA) cancelled the contract, stating 
that the figure seemed to have more than a fan in her hand. It is definitely more than a 
fan, but this deliberate understatement reveals the MTA's discomfort with this provoca¬ 
tive, activist image. 33 

A Sort of Joy deploys wholly different tactics to similar ends. The performance starts 
with a group of white men standing in a circle in the center of the room. They face 
out toward the audience, which surrounds them. The men are dressed like stereotypi¬ 
cal museum visitors: collared shirts, slacks, and so on. Each wears headphones and 
holds an iPad on which the names of artists in the collection scroll by. "John," the 
men say together. We see the iPads scrolling through all the names of the artists in the 
MoMA collection whose first name is John: John Baldessari, John Cage, John Lennon, 
John Waters, and so on. Three female performers, also wearing headphones and car¬ 
rying iPads with scrolling names, pace around the circle of men. "Robert," the men 
say together, and the names scroll through the Roberts alphabetically. The women are 
silent and keep walking. "David," the men say together. It soon becomes apparent that 
the artists are sorted by first name, and then ordered by which first name has the most 
works in the collection. Thus, the Johns and Roberts and Davids come first, because they 
have the most works in the collection. But Marys have fewer works, and Mohameds 
and Marias are barely in the register. Several minutes later, after the men say "Michael," 
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"James," "George," "Hans," "Thomas," "Walter,” "Edward," "Yan," "Joseph," "Martin," 
"Mark," "Jose," "Louis," "Frank," "Otto," "Max," "Steven," "Jack," "Henry," "Henri," 
"Alfred," "Alexander," "Carl," "Andre," "Harry," "Roger," and "Pierre," "Mary" finally 
gets her due, spoken by the female performers, the first sound they've made. 

For audience members, the experience is slightly confusing at first. Why are the men 
in a circle? Why do they randomly speak someone's name? And why are those women 
walking around so intently? But "Mary" becomes a kind of aha moment, highlighting 
the highly gendered nature of the collection—exactly the same kind of experience of 
insight that data visualization is so good at producing, according to researcher Martin 
Wattenberg. 34 From that point on, audience members start to listen differently, eagerly 
awaiting the next female name. It takes more than three minutes for "Mary" to be 
spoken, and the next female name, "Joan," doesn't come for a full minute longer. "Bar¬ 
bara” follows immediately after that, and then the men return to reading: "Werner," 
"Tony," "Marcel," "Jonathan." 

From a data analysis perspective, A Sort of Joy consists of simple operations: counting 
and grouping. A bar chart or a tree map of first names could easily have represented 
the same results. But presenting the dataset as a time-based experience makes the audi¬ 
ence wait and listen and experience. It also runs counter to the mantra in information 
visualization expressed by researcher Ben Shneiderman in the mid-1990s: "Overview 
first, zoom and filter, then details-on-demand." 35 In this data performance, we do not 
see the overview first. We hear and see and experience each datapoint one at a time 
and only slowly construct a sense of the whole. The different gender expressions, body 
movements, and verbal tones of the performers draw our collective attention to the 
issue of gender in the MoMA collection. We start to anticipate when the next woman's 
name will arise. We feel the gender differential, rather than see it. 

This feeling is affect. It comprises the emotions that arise when experiencing the 
performance, as well as the physiological reactions to the sounds and movements made 
by the performers, as well as the desires and drives that result—even if that drive is to 
walk into another room because the performance is disconcerting or just plain long. 

Data visceralizations that leverage affect aren't limited to major art institutions. 
Catherine and artist Andi Sutton led walking tours of the future coastline of Boston 
based on sea level rise. 36 Interactive artist Mikhail Mansion made a leaning, bobbing 
chair that animatronically shifts based on real-time shifts in river currents. 37 Nonprofit 
organizations in Tanzania staged a design competition for data-driven clothing that 
incorporated statistics about gender inequality and closed the project with a fashion 
runway show. 38 Artist Teri Rueb stages "sound encounters" between the geologic layers 
of a landscape and the human body that is affected by them. 39 Simon Elvins drew a 
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giant paper map of pirate radio stations in London that you can actually listen to. 40 A 
robot designed by Annina Rust decorates real pies with pie charts about gender equal¬ 
ity, and then visitors eat them. 41 

These projects may seem to be speaking to another part of brain (or belly) than your 
standard histograms or network maps, but there is something to be learned from the 
opportunities opened up by visceralizing data. Deliberately embracing emotions like 
wonder, confusion, humor, and solidarity enables a valuable form of data maximal¬ 
ism, one that allows for multisensory entry points, greater accessibility, and a range of 
learning types. 

Visceralizing Uncertainty 

Scientific researchers are now proving by experiment what designers and artists have 
known through practice: activating emotion, leveraging embodiment, and creating 
novel presentation forms help people grasp and learn more from data-driven argu¬ 
ments, as well as remember them more fully. 42 As it turns out, visceralizing data may 
help designers solve one particularly pernicious problem in the visualization commu¬ 
nity: how to represent uncertainty in a medium that's become rhetorically synony¬ 
mous with the truth. To this end, designers have created a huge array of charts and 
techniques for quantifying and representing uncertainty. These include box plots, vio¬ 
lin plots (figure 3.6), gradient plots, and confidence intervals. 43 Unfortunately, how¬ 
ever, people are terrible at recognizing uncertainty in data visualizations, even when 
they're explicitly told that something is uncertain. This remains true even for some 
researchers who use data themselves! 44 

For example, let's consider the Total Electoral Votes graphic displayed as part of 
the New York Times live online coverage of the 2016 presidential election (figure 3.7). 
The blue and red lines represent the New York Times's best guess at the outcomes 
over the course of election night and into the following day. The gradient areas show 
the degree of uncertainty that surrounded those guesses, with the darker inner area 
showing electoral vote outcomes that came up 25 percent to 75 percent of the time, 
and the lighter outer areas showing outcomes that came up 75 percent to 95 percent 
and 5 percent to 25 percent of the time, respectively. If you look closely at the far left 
of the graphic, which represents election night (everything prior to the 12:00 a.m. axis 
label), the outcome of Trump winning and Clinton losing easily falls within the 5 to 25 
percent likelihood range. 

Although many election postmortems pronounced the 2016 election the Great 
Failure of Data and Statistics, because most simulations and other statistical models 
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Figure 3.6 

What is the best way to communicate uncertainty in a medium that looks so certain? Designers 
have created diverse chart forms to try to solve this problem. Depicted here are five violin plots; 
each shows the distribution of data along with their probability density (the purple part). You 
could also think of this form as a beautiful purple vagina, as the comic xkcd has observed; see 
https://www.xkcd.com/1967/. Images from the Data Visualisation Catalogue. 


Total Electoral Votes 

The estimates below include an estimate of uncertainty. We expect the 
uncertainty around these estimates to narrow, especially after races are 
called. 




Figure 3.7 

A 2016 chart from the New York Times that uses opacity—darker and lighter shades of blue and 
red—to indicate uncertainty. Images by Gregor Aisch, Nate Cohn, Amanda Cox, Josh Katz, Adam 
Pearce, and Kevin Quealy for the New York Times. 
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suggested that Clinton would win, most forecasts did include the possibility of a 
Trump victory. The underlying problem was not the failure of data but the difficulties 
of depicting uncertainty in visual form. People are just not sufficiently trained to recog¬ 
nize uncertainty in graphics such as this. Rather than interpreting the gradient bands 
as probabilities (e.g., Trump had a 20 percent chance of winning at 6 p.m.), people may 
interpret them as votes (e.g., Trump had 20 percent of the vote at 6 p.m.). Or they may 
ignore the gradient bands altogether and look only at the lines. Or they see Clinton on 
top and assume she is winning. This is called heuristics in psychology literature—using 
mental shortcuts to make judgments—and it happens all the time when people are 
asked to assess probabilities. 45 A large part of the problem is that visualization conven¬ 
tions reinforce these misjudgments. The graphics look so certain, even when they are 
trying their very hardest to visually illustrate uncertainty! 

Jessica Hullman, whose work on rhetoric we've already mentioned, offers one solu¬ 
tion to this problem. Instead of creating fixed plots as in the New York Times example 
that represents uncertainty in aggregate or static form, Hullman advocates for ren¬ 
dering experiences of uncertainty. 46 In other words, leverage emotion and affect so that 
people experience uncertainty perceptually. Or, to invoke a common refrain from 
rhetorical training and design schools, "show, don't tell." Rather than telling people 
that they are looking at uncertainty while employing a certain-looking graphic style— 
which creates conditions ripe for those pesky heuristics to intervene—make them feel 
the uncertainty. 

We can see a good example of showing uncertainty in action on the same New York 
Times live election coverage webpage. At the top of the page was a gauge (figure 3.8) 
that showed the New York Times's real-time prediction of who was likely to win the race, 
with a gradient of categories that ranged from medium blue ("Very Likely" that Clinton 
would win) to medium red ("Very Likely" that Trump would win). But the needle did 
not stay in one place. It jittered between the twenty-fifth and seventy-fifth percentiles, 
showing the range of outcomes that the New York Times was then predicting, based 
on simulations using the most recent data. At the beginning of the day, the range of 
motion was fairly wide but still only showed the needle on the Hillary Clinton side. 
As the night went on, its range narrowed, and the center moved closer and closer to 
the red side of the gauge. By 9 p.m., the needle jittered just a little, and on the Trump 
side only. 

A number of New York Times readers were aggressive in their dislike of the jitter, call¬ 
ing it "irresponsible" and "unethical" and "the most stressful thing I've ever looked 
at online and I've seen a lot of stressful shit." 47 In response, Gregor Aisch, one of the 
designers of the gauge, defended it, explaining that "we thought (and still think!) this 
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Chance of Winning Presidency 



Figure 3.8 

The controversial "jittering" election gauge featured in the New York Tones coverage of the 2016 
presidential election. Images by Gregor Aisch, Nate Cohn, Amanda Cox, Josh Katz, Adam Pearce, 
and Kevin Quealy for the New York Times. 

movement actually helped demonstrate the uncertainty around our forecast, convey¬ 
ing the relative precision of our estimates." 48 So was this "unethical" design, or the 
sophisticated communication of uncertainty? 

Building off of Hullman's work, we'd say that the answer is the latter. The jitter¬ 
ing election gauge was actually exhibiting current best practices for communicating 
uncertainty. It gave people the perceptual, intuitive, visceral, and emotional experience 
of uncertainty to reinforce the quantitative depiction of uncertainty. The fact that it 
unsettled so much of the New York Times readership probably had less to do with the 
ethics of the visualization and more to do with the outcome of the election. So score 
one for emotion in the task of representing uncertainty. 

Don't Never Do a God Trick 

So does this mean that all election graphics should jitter? Or that data visceralizations 
are categorically superior to visual graphics? Or that a data visualization framed around 
emotion is always the "better" choice for the design task at hand? Or that designers 
should never use the god trick to present a view from above? 

The answer to these questions may surprise you: definitively no! If there is any 
single rule in design, it's that context is queen. A design choice made in one context 
or for one audience does not translate to other contexts or audiences. Simply stated: it 
is never a good idea to say "never" in design. We delve deeper into the importance of 
context when working with data in chapter 6. 
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Let's take the god trick as an example. Even though the god trick can do harm—for 
example, in the form of those racist, objective-looking redlining maps from chapter 
2—there are also good reasons to use the god trick as a form of recuperation, contesta¬ 
tion, or empowerment. As renowned data visualization designer Fernanda Viegas says, 
"The kind of overview that data visualization provides is one of the superpowers I 
treasure the most." 49 

It is this "superpower"—the aerial view from no body—that we see put into practice 
in the map Coming Home to Indigenous Place Names in Canada (figure 3.9a). Margaret 
Pearce, a cartographer and member of the Citizen Potawatomi Nation, spent fifteen 
months collecting Indigenous place names from First Nations, Metis, and Inuit peo¬ 
ples. The map depicts the land that is known in a contemporary Anglo-Western con¬ 
text as Canada, but without any of the common colonial orientation points, like the 
boundaries of the provinces or the locations of major cities like Ottawa, Montreal, and 
Nova Scotia. For example, you can see in figure 3.9b that places like Kinoomaagewaabi- 
kaang ("Teaching rocks") and Odawa ("Traders") and Kazabazua gajiibajiwan ("River 
runs under") fill the area usually described as Toronto. As the publisher's description 
states, "The names are ancient and recent, both in and outside of time, and they 
express and assert the Indigenous presence across the Canadian landscape in Indig¬ 
enous languages." 50 

Coming Home to Indigenous Place Names in Canada leverages the authority of the 
god's-eye view to challenge the colonizer's view, to advocate for a "reseeing” of the 
land under terms of engagement that recognize Indigenous sovereignty and respect 
Indigenous homelands. The extent of geographic territory included and the sheer 
number of names asserts the Indigenous presence as major, originary, and ongoing. 
This is by design. Pearce intentionally created the map with the same paper size, fold, 
scale, and projection as the map published by Natural Resources Canada, the country's 
geographic authority. 51 By replicating these design features, Coming Home proposes an 
alternative yet equally authoritative conception of national identity. 52 

There is actually another twist on top of its intentionally authoritative view: the map 
does not reveal everything. 53 As the cartographer, Pearce proceeded with the Indigenous 
methodologies of respect, responsibility, and reciprocity. 54 What this meant in practice 
was caring for each name as Indigenous cultural property—securing the permissions 
for each name and respecting when communities did not want to share English trans¬ 
lations of the name. She is definitive about the fact that the names are not data: "The 
place names aren't datasets. The place names are cultural property being shared with 
the map that come from people." Protecting that cultural property also meant protect¬ 
ing the exact geographic location of each site from being shared with outsiders. This 
is where the scale of the god trick has protective effects: because it is generalized, at 
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Figure 3.9 

Overview and detail view of Coming Home to Indigenous Place Names in Canada (2017). (a) The map 
depicts First Nations, Metis, and Inuit place names collected from many tribes and nations across 
what is today more commonly called Canada, (b) The detail view depicts Indigenous names from 
the area around Toronto. Map by Margaret W. Pearce; map design copyright 2017 Canadian- 
American Center, University of Maine. 
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Figure 3.9 (continued) 

Place names in this detail image shared by permission of the following: 

Alan Corbiere. 

Hiio Delaronde and Jordan Engel, "Haudenosaunee Country in Mohawk," The Decolonial Atlas, 
decolonialatlas.wordpress.com/2015/02/04/haudenosaunee-country-in-mohawk-2/, by permis¬ 
sion of the authors. 

Charles Lippert and Jordan Engel, "The Great Lakes: An Ojibwe Perspective," The Decolonial Atlas, 
decolonialatlas.wordpress.com/2015/04/14/the-great-lakes-in-ojibwe-v2/, by permission of the 
authors. 

Kitigan Zibi Anishinabeg. 

Brian Mclnnes, Sounding Thunder: The Stories of Francis Pegahmagabow (East Lansing: Michigan 
State University Press, 2016), by permission of Brian Mclnnes, with gratitude to James Dumont 
and Wasauksing First Nation. 

Woodland Cultural Centre, place names from Frances Froman, Alfred Keye, Lottie Keye, and 
Carrie Dyck, English-Cayuga/Cayuga-English Dictionary (Toronto, Ontario: University of Toronto 
Press); and Marianne Mithun and Reginald Henry, Wadewaygstanih. A Cayuga Teaching Grammar 
(Brantford, Ontario: Woodland Publishing, The Woodland Cultural Centre, 1984), by permission 
of Amos Key Jr. and Carrie Dyck. 
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1:5,000,000, it serves to communicate general location without pinpointing exact loca¬ 
tion. In this case, the god trick communicates Indigenous authority while preserving 
Indigenous autonomy. 

Although the map has been published and the project is ostensibly finished, Pearce's 
commitment to and care for the Indigenous names continues. Each time the map is 
reproduced, such as in the detail image in figure 3.9b, Pearce writes to the communities 
to whom the names belong, explains the proposed context of the names, and requests 
permission for the names to be reproduced in that context. You can see these permis¬ 
sions as they were granted in the caption for the figure. Pearce proceeds with sensitivity 
to her own positionalities, as well as the depth of meaning of each particular place, its 
name, and the community. Place names are "relations," she says. "And they're not my 
relations, it's not my territory. ... They co-exist as relations that are incorporated into 
the community.” Cartography, then, becomes not a straightforward representation of 
"what is" in some absolute sense. Rather, Coming Home is a map of relations, conversa¬ 
tions, and shared investments across difference in the landscape. 

Elevate Emotion and Embodiment 

The third principle of data feminism, and the theme of this chapter, is to elevate emo¬ 
tion and embodiment. As we have shown, these are crucial if often undervalued tools in 
the data communication toolbox. They help avoid inadvertently conveying the view 
from no body: the view from an imaginary and impossible standpoint that does not 
and cannot exist. 

How has the whole picture, the overview, or the god trick come to be seen as rational 
and objective at all? How did the field of data visualization arrive at a set of conven¬ 
tions that prioritize rationality, devalue emotion, and completely ignore the nonseeing 
organs in the human body? Who is excluded when only vision is included? 

Any knowledge community inevitably places certain things at the center and casts 
others out, in the same way that male bodies are almost always taken as the norm in 
scientific studies while female bodies are viewed as deviations, or that abled bodies 
are almost always taken as primary design cases while disabled bodies require a design 
retrofit. Feminist human-computer interaction (HCI) scholar Shaowen Bardzell asserts 
designers should look first to those at the margins: the people pushed to the margins 
in any particular design context demonstrate who and what the system is trying to 
exclude. 55 Subsequent work in HCI insists that designers then work to "demarginalize 
the 'margins' by recognizing intersections that exist, and engaging solidarity to navi¬ 
gate towards equity and inclusion." 56 
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In the case of data visualization, what is excluded is emotion and affect, embodi¬ 
ment and expression, embellishment and decoration. These are the aspects of human 
experience associated with women, and thus devalued by the logic of our master ste¬ 
reotype. But Periscopic's gun violence visualization shows how visual minimalism can 
coexist with emotion for maximum impact. Works like A Sort of Joy demonstrate that 
data communication can be visceral—an experience for the whole body. And Coming 
Home to Indigenous Place Names in Canada establishes that the god trick itself can be 
used to simultaneously engender emotion and challenge injustice. 

Rather than making universal rules and ratios (think: data-ink) that exclude some 
aspects of human experience in favor of others, our time is better spent working toward 
a more holistic and more inclusive ideal. All design fields, including visualization and 
data communication, are fields of possibility. Black feminist sociologist Patricia Hill 
Collins describes an ideal knowledge situation as one in which "neither ethics nor 
emotions are subordinated to reason." 57 Rebalancing emotion and reason opens up the 
data communication toolbox and allows us to focus on what truly matters in a design 
process: honoring context, architecting attention, and taking action to defy stereotypes 
and reimagine the world. 58 


4 "What Gets Counted Counts" 


Principle: Rethink Binaries and Hierarchies 

Data feminism requires us to challenge the gender binary, along with other systems of counting 
and classification that perpetuate oppression. 


"Sign in or create an account to continue." At a time in which every website seems to 
require its own user account, these words often elicit a groan—and the inevitability of 
yet another password that will soon be forgotten. But for people like Maria Munir, the 
British college student who famously came out as nonbinary to then president Barack 
Obama on live TV, the prospect of creating a new user account is more than mere 
annoyance. 1 Websites that require information about gender as part of their account 
registration process almost always only provide a binary choice: "male or female." 2 
For Munir, those options are insufficient. They also take an emotional toll: "I wince 
as I'm forced to choose 'female' over 'male' every single time, because that's what 
my passport says, and ... being non-binary is still not legally recognised in the UK," 
Munir explains. 3 

For the millions of nonbinary people in the world—that is, people who are not 
either male or female, men or women—the seemingly simple request to "select gender" 
can be difficult to answer, if it can be answered at all. 4 Yet when creating an online 
user account, not to mention applying for a national passport, the choice between 
"male" or "female," and only "male" or "female," is almost always the only one. 5 
These options (or the lack thereof) have consequences, as Munir clearly states: "If you 
refuse to register non-binary people like me with birth certificates, and exclude us in 
everything from creating bank accounts to signing up for mailing lists, you do not 
have the right to turn around and say that there are not enough of us to warrant 
change." 6 

"What gets counted counts," feminist geographer Joni Seager has asserted, and 
Munir is one person who understands that. 7 What is counted—like being a man or a 
woman—often becomes the basis for policymaking and resource allocation. By con¬ 
trast, what is not counted—like being nonbinary—becomes invisible (although there 
are also good reasons for being invisible in some contexts, and we'll come back to 
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those shortly). Seager's research focus is gender, the environment, and policy (see fig¬ 
ure 4.1), and she points out that there is more global data on gender being collected 
than ever before. And yet, these data collection efforts often still leave many people 
out, including nonbinary people, lesbians, and older women. Even among those who 
are counted, they tend to be asked very narrow questions about their lives. "Women in 
poor countries seem to be asked about 6 times a day what kind of contraception they 
use," Seager quipped in a lecture at the Boston Public Library. "But they are not asked 
about whether they have access to abortion. They are not asked about what sports they 
like to play." 8 

The process of converting qualitative experience into data can be empowering, 
and even has the potential to be healing, as we address toward the end of this chap¬ 
ter. When thoughtfully collected, quantitative data can be empowering too. So many 
issues of structural inequality are problems of scale, and they can seem anecdotal until 
they are viewed as a whole. For instance, in 2014, when film professors Shelley Cobb 
and Linda Ruth Williams set out to count the women involved in the film industry in 
the United Kingdom, they encountered a woman screenwriter who had never before 
considered the fact that in the United Kingdom, women screenwriters are outnum¬ 
bered by screenwriters of other genders at a rate of four to one. 9 She expressed surprise: 
"I didn't even know that because screenwriters never get to meet each other." 10 

A similar situation occurred in the example of ProPublica's reporting on maternal 
mortality in the United States, as discussed in chapter 1. The investigative team set out 
to count all the mothers who had died in childbirth or from complications shortly 
thereafter. They interviewed many families of women who had died while giving 
birth, but, like the screenwriter, few of the families were aware that the phenomenon 
extended beyond their own daughters and sisters, partners and friends. This lack of 
data, like the issue of maternal mortality itself, is another structural problem, and it 
serves as an example of why feminist sociologists like Ann Oakley have long advocated 
for the use of quantitative methods alongside qualitative ones. Without quantitative 
research, Oakley explains, "it is difficult to distinguish between personal experience 
and collective oppression." 11 

But before collective oppression can be identified through analyses like the one 
that ProPublica conducted, the data must exist in the first place. Which brings us back 
to Maria Munir and the importance of collecting data that reflects the population it 
purports to represent. On this issue, Facebook was ahead of the curve when, in 2014, it 
expanded the gender categories available to registered users from the standard two to 
over fifty choices, ranging from "Genderqueer" to "Neither"—a move that was widely 
praised by a range of LGBTQ+ advocacy groups (figure 4.2a). 12 One year later, when the 


'What Gets Counted Counts' 


99 


Maternity and paternity leave 

Legal requirements and paid support 


In most countries without government funding, employers are required to provide paid support 


2013 

^ maternity leave in days (maximum shown) 
A paternity leave in days 


^ THE US 1 
GOVERNMENT IS THE ONLT 
ONE IN THE DEVELOPED WORLD 



Figure 4.1 

Maternity and paternity leave around the globe from The Women's Atlas, Sth edition (2018). Joni 
Seager and Annie Olson started working on the first women's atlas in 1980, when there was very 
little global data on women. The book is now in its fifth edition, but Seager highlights that there 
are still huge gender data gaps. Image courtesy of Joni Seager and Penguin Books. 
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company abandoned its select-from-options model altogether, replacing the "Gender" 
dropdown menu with a blank text field, the decision was touted as even more progres¬ 
sive (figure 4.2b). 13 Because Facebook users could input any word or phrase to indi¬ 
cate their gender, they were at last unconstrained by the assumptions imposed by any 
preset choice. 14 

But additional research by information studies scholar Rena Bivens has shown that 
below the surface, Facebook continues to resolve users' genders into a binary: either 
"male" or "female." 15 Evidently, this decision was made so that Facebook could allow 
its primary clients—advertisers—to more easily market to one gender or the other. 
Put another way, even if you can choose the gender that you show to your Facebook 
friends, you can't change the gender that Facebook provides to its paying customers 
(figure 4.3). And this discrepancy leads right back to the issues of power we've been 
discussing since the start of this book: it's corporations like Facebook, and not individu¬ 
als like Maria Munir, who have the power to control the terms of data collection. This 
remains true even as it is people like Munir who have personally (and often painfully) 
run up against the limits of those classification systems—and who best know how they 
could be improved, remade, or in some cases, abolished altogether. 

Feminists have spent a lot of time thinking about classification systems because the 
criteria by which people are divided into the categories of man and woman is exactly 
that: a classification system. 16 And while the gender binary is one of the most widespread 
classification systems in the world today, it is no less constructed than the Facebook 
advertising platform or, say, the Golden Gate Bridge. The Golden Gate Bridge is a physi¬ 
cal structure; Facebook ads are a virtual structure; and the gender binary is a conceptual 
one. But all these structures were created by people: people living in a particular place, at 
a particular time, and who were influenced—as we all are—by the world around them. 17 

Many twentieth-century feminist scholars attempted to address the social construc¬ 
tion of gender by treating gender as something separate from sex. But that distinction 
is increasingly breaking down. Both gender and sex are social constructs, as it turns out. 
Even sex, which today is sometimes still considered in biologically essential terms, has 
a distinct cultural history. It can be traced to a place (Europe) and a time (the Enlight¬ 
enment) when new theories about democracy and what philosophers called "natural 
rights” began to emerge. Before then, there was a hierarchy of the sexes, with men on 
the top and women on the bottom. (Thanks, Aristotle! 18 ) But there wasn't exactly a 
binary distinction between those two (or any other) sexes. In fact, according to histo¬ 
rian of sex and gender Thomas Laqueur, most people believed that women were just 
inferior men, with penises located inside instead of outside of their bodies and that— 
for reals!—could descend at any time in life. 19 
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Figure 4.2 

(a) Facebook's initial attempt to allow users to indicate additional genders, circa 2014. Image 
courtesy of Slate, (b) Facebook's updated gender field, circa 2018. Screenshot by Lauren F. Klein. 
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Figure 4.3 

Detailed view of Facebook's new account creation page, circa 2018. Note that you still have to 
choose "Female" or "Male"—a binary choice—when you sign up. Screenshot by Lauren F. Klein. 


For the idea of a sex binary to gain force, it would take figures like Thomas Jefferson 
declaring that all men were created equal, and entire countries like the United States 
to be founded on that principle. Once that happened, political leaders began to worry 
about what, exactly, they had declared: to whom did the principle of equality apply? 
All sorts of systems for classifying people have their roots in that era—not only sex but 
also, crucially, race. 20 Before the eighteenth century, Western societies understood race 
as a concept tied to religious affiliation, geographic origin, or some combination of 
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both. Race had very little to do with skin color until the rise of the transatlantic slave 
trade, in the seventeenth century. 21 And even then, race was still a hazy concept. It 
would take the scientific racism of the mid-eighteenth century for race to begin to be 
defined by Western societies in terms of black and white. 

Take Carl Linnaeus, for example, and the revolutionary classification system that 
he is credited with creating. 22 Linnaeus's system of binomial classification is the one 
that scientists still use to today to classify humans and all other living things. But Lin¬ 
naeus's system didn't just include the category of homo sapiens, as it turns out. It also 
incorrectly—but as historians would tell you, unsurprisingly—included five subcatego¬ 
ries of humans separated by race. (One of these five was set aside for mythological 
humans who didn't exist in real life, in case you're still ready to get behind his science.) 
But Linnaeus's classification system wasn’t even the worst of the lot. Over the course 
of the eighteenth century, increasingly racist systems of classification began to emerge, 
along with pseudosciences like comparative anatomy and physiognomy. These allowed 
elite white men to provide a purportedly scientific basis for the differential treatment 
of people of color, women, disabled people, and gay people, among other groups. 
Although those fields have long since been discredited, their legacy is still visible in 
instances as far-ranging as the maternal health outcomes that we've already discussed, 
to the divergent rates of car insurance that are offered to Black vs. white drivers, as 
described in an investigation conducted by ProPublica and Consumer Reports. 23 What's 
more, as machine learning techniques are increasingly extended into new domains of 
human life, scientific racism is itself returning. Pointing to and debunking one machine 
learning technique that employs images of faces in an attempt to classify criminals, 
three prominent artificial intelligence researchers—Blaise Agiiera y Areas, Margaret 
Mitchell, and Alexander Todorov—have asserted that scientific racism has "entered 

74 

a new era. 

A simple solution might be to say, "Fine, then. Let's just not classify anything or 
anyone!" But the flaw in that plan is that data must be classified in some way to be 
put to use. In fact, by the time that information becomes data, it's already been classi¬ 
fied in some way. Data, after all, is information made tractable, to borrow a term from 
computer science. "What distinguishes data from other forms of information is that 
it can be processed by a computer, or by computer-like operations," as Lauren has 
written in an essay coauthored with information studies scholar Miriam Posner. 25 And 
to enable those operations, which range from counting to sorting and from model¬ 
ing to visualizing, the data must be placed into some kind of category—if not always 
into a conceptual category like gender, then at the least into a computational category 
like Boolean (a type of data with only two values, like true or false), integer (a type of 
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number with no decimal points, like 237 or -1), or string (a sequence of letters or words, 
like "this"). 

Classification systems are essential to any working infrastructure, as information 
theorists Geoffrey Bowker and Susan Leigh Star have argued in their influential book 
Sorting Things Out. 26 This is true not only for computational infrastructures and con¬ 
ceptual ones, but also for physical infrastructures like the checkout line at the grocery 
store. Think about how angry a shopper can get when they're stuck in the express 
line behind someone with more than the designated fifteen items or less. Or, closer to 
home, think of the system you use (or should use) to sort your clothes for the wash. 
It's not that we should reject these classification systems out of hand, or even that we 
could if we wanted to. (We're pretty sure that no one wants all their socks to turn pink.) 
It's just that once a system is in place, it becomes naturalized as "the way things are." 
This means we don't question how our classification systems are constructed, what 
values or judgments might be encoded into them, or why they were thought up in the 
first place. In fact—and this is another point made by Bowker and Star—we often forget 
to ask these questions until our systems become objects of contention, or completely 
break down. 

Bowker and Star give the example of the public debates that took place in the 1990s 
around the categories of race employed on the US Federal Census. At issue was whether 
people should be able to choose multiple races on the census form. Multiracial people 
and their families were some of the main proponents of the option, who saw it as a way 
to recognize their multiple identities rather than forcing them to squeeze themselves 
into a single, inadequate box. Those opposed included the Congressional Black Caucus 
as well as some Black and Latinx civil rights groups that saw the option as potentially 
reducing their representative voice. 27 Ultimately, the 2000 census did allow people to 
choose multiple races, and millions of people took advantage of it. But the debates 
around that single category illustrate how classification gets complicated quickly, and 
with a range of personal and political stakes. 28 

Classification systems also carry significant material consequences, and the US Cen¬ 
sus provides an additional example of that. Census counts are used to draw voting 
districts, make policy decisions, and allocate billions of dollars in federal resources. 
The recent Republican-led proposal to introduce a question about citizenship status 
on the 2020 census represents an attempt to wield this power to very pointed politi¬ 
cal ends. Because undocumented immigrants know the risks, like deportation, that 
come with being counted, they are less likely to complete the census questionnaire. 
But because both political representation and federal funding are allocated according 
to the number and geographic areas of people counted in the census, undercounting 
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undocumented immigrants (and the documented immigrants they often live with) 
means less voting power—and fewer resources—accorded to those groups. This is a 
clear example of what we term the paradox of exposure: the double bind that places 
those who stand to significantly gain from being counted in the most danger from that 
same counting (or classifying) act. 

In each of these cases, as is true of any case of not fitting (or not wanting to fit) 
neatly into a box, it's important to ask whether it's the categories that are inadequate, or 
whether—and this is a key feminist move—it's the system of classification itself. Lurk¬ 
ing under the surface of so many classification systems are false binaries and implied 
hierarchies, such as the artificial distinctions between men and women, reason and 
emotion, nature and culture, and body and world. Decades of feminist thinking have 
taught us to question why these distinctions have come about; what social, cultural, or 
political values they reflect; what hidden (or not so hidden) hierarchies they encode; 
and, crucially, whether they should exist in the first place. 

Questioning Classification Systems 

Let's spend some time with an actual person who has started to question the classifica¬ 
tion systems that surround him: one Michael Hicks, an eight-year-old Cub Scout from 
New Jersey. Why is Mikey, as he's more commonly known, so concerned about classifi¬ 
cation? Well, Mikey shares his name with someone who has been placed on a terrorist 
watch list by the US federal government. As a result, Mikey has also been classified as 
a potential terrorist and is subjected to the highest level of airport security screening 
every time that he travels. "A terrorist can blow his underwear up and they don't catch 
him. But my 8-year-old can't walk through security without being frisked," his mother 
lamented to Lizette Alvarez, a reporter for the New York Times who covered the issue 
in 2010. 29 

Of course, in some ways, Mikey is lucky. He is white, so he does not run the risk 
of racial profiling—unlike, for example, the many Black women who receive TSA pat- 
downs due to their natural hair. 30 Moreover, Mikey's name sounds Anglo-European, 
so he does not need to worry about religious or ethnic profiling either—unlike, for 
another example, people named Muhammad who are disproportionately pulled over 
by the police due to their Muslim name. 31 But Mikey the Cub Scout still helps to expose 
the brokenness of some of the categories that structure the TSA's terrorist classification 
system; the combination of first and last name is simply insufficient to classify some¬ 
one as a terrorist or not. 
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Or, consider another person with a history of bad experiences at the (literal) hands 
of the TSA. Sasha Costanza-Chock is nonbinary, like Maria Munir. They are also a 
design professor at MIT, so they have a lot of experience both living with and think¬ 
ing through oppressive classification systems. In a 2018 essay, "Design Justice, A.I., 
and Escape from the Matrix of Domination," they give a concrete example of why 
design justice is needed in relation to data. 32 The essay describes how the seemingly 
simple system employed by the operators of those hands-in-the-air millimeter-wave 
airport security scanning machines is in fact quite complex—and also fundamentally 
flawed. 

Few cisgender people are aware of the fact that before you step into a scanning 
machine, the TSA agent operating the machine looks you up and down, decides whether 
you are a man or a woman, and then pushes a button to select the corresponding gen¬ 
der on the scanner's touchscreen interface. That human decision loads the algorithmic 
profile for either male bodies or female ones, against which your body's measurements 
are compared. If your measurements diverge from the statistical norm of that gender's 
body—whether the discrepancy is because you're concealing a deadly weapon, because 
your body doesn't fit neatly into either of the two categories that the system has pro¬ 
vided, or because the TSA agent simply made the wrong choice—you trigger a "risk 
alert." Then, in an act of what legal theorist Dean Spade terms administrative violence, 
you are subjected to the same full-body pat-down as a potential terrorist. 33 Here it's not 
that the scanning machines rely upon an insufficient number of categories, as in the 
case of Mikey the Cub Scout, or that they employ the wrong ones, as Mikey's mom 
would likely say. It's that the TSA scanners shouldn't rely on gender to classify air trav¬ 
elers to begin with. (And while we're going down that path, how about we imagine a 
future without a state agency that systematically pathologizes Black women and trans 
people and Cub Scouts in the first place?) 

So when we say that what gets counted counts, it's folks like Sasha Costanza-Chock 
or Mikey Hicks or Maria Munir that we're thinking about. Because flawed classifica¬ 
tion systems—like the one that underlies the airport scanner's risk-detection algorithm 
or the one that determines which names end up on terrorist watch lists or simply 
(simply!) the gender binary—are not only significant problems in themselves, but also 
symptoms of a more global condition of inequality. The matrix of domination, which 
we introduced in chapter 1, describes how race, gender, and class (among other things) 
intersect to enhance opportunities for some people and constrain opportunities for 
others. 34 Under the matrix of domination, normative bodies pass through scanners, 
borders, and bathrooms with ease; these systems have been designed by people like 
them, for people like them, with an aim—sometimes explicit—of keeping people not 
like them out. 35 
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As these examples help to show, the forces that operate through the matrix of domi¬ 
nation are sneaky and diffuse. And they show up everywhere—even in pockets on 
pants. A recent journalistic investigation of the size of pockets in eighty pairs of men's 
and women's jeans confirmed what women (and men and nonbinary people who wear 
women’s jeans) have been saying anecdotally for years: that their pants pockets just 
aren't big enough (figure 4.4). 36 More specifically, the pockets of jeans designed for 
women are 48 percent shorter and 6.5 percent narrower than the pockets of jeans 
designed for men. This size does matter! According to the same study, only 40 percent 
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Figure 4.4 

From "Someone Clever Once Said Women Were Not Allowed Pockets," a comparative study of 
pockets in women's and men's jeans by The Pudding (2018). Visualization by Jan Diehm and 
Amber Thomas for The Pudding. 
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of the front pockets of women's jeans can fit a smartphone, and less than half "can 
fit a wallet specifically designed to fit in front pockets." Hence the thriving market for 
women’s handbags (to hold the aforementioned front-pocket wallet) and for replace¬ 
ment smartphone screens (for when your phone invariably falls out of your too-small 
pocket and cracks). 

Now, the designers of any particular pair of women's jeans are almost certainly not 
thinking: "Let's oppress women by making their pockets too small.” They are probably 
only thinking about what looks nice. But what looks nice has a history too. Before 
the seventeenth century, "pockets" were external sacks on strings that could be tied 
above or below other garments. But starting in the 1600s, men’s clothing began to 
feature internal pockets. Meanwhile, women's clothing became increasingly close-cut. 
By the late eighteenth century, the women's pocket reached its breaking point, result¬ 
ing in emergence of a new fashion item called a reticule, otherwise known as a purse. 
These tiny handbags were made out of cloth and, according to the Victoria and Albert 
Museum's helpful online history of pockets, could not hold very much. 37 And yet, as 
the museum curators point out, in an era in which most people shared all of their 
shelves and dressers, these reticules were one of the few places for women to store any 
items they wanted to keep to themselves. Fast forward to the present, and women (and 
people who wear women's fashion) must still carry their belongings outside of their 
clothes and on public display. They're also limited in their ability to use both of their 
hands at the same time. It's (mostly) a minor annoyance, but it's one way among many 
that the patriarchy —a term that describes the combination of legal frameworks, social 
structures, and cultural values that contribute to the continued male domination of 
society—inadvertently and invisibly reproduces itself. In this case, it's pants—perhaps 
even the ones you're wearing right now—that compound and consolidate the patriar¬ 
chy's oppressive force. 

In addition to pants pockets, one of the other things that upholds the patriarchy 
is, as it turns out, our ideas about gender itself. We've already asserted that gender is a 
social construct, but what does this phrase really mean? Queer theorist Judith Butler 
has long maintained that gender is best understood as a repeated performance, a set of 
categories that cohere by, for instance, wearing jeans with small pockets (or no pock¬ 
ets at all) or by participating in an activity that is similarly gender-coded, like child- 
rearing, or—importantly for Butler—having heterosexual sex. 38 These performative acts, 
as she terms them, repeated so many times that they become taken as fact, are what 
define the gender categories that we have today. Butler's idea of gender as performative 
moves away from an essentialist conception of the term: the idea that there is some 
innate or "essential” criteria that makes one, for instance, a woman or man. But these 
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performances still reinforce the categories of gender, she reminds us, even if the actions 
and activities that determine them are not innate. 

Gender is certainly complicated. This is one thing about which most contemporary 
scholars of gender largely agree. Conceptions of gender in health and clinical fields are 
also evolving as well. For example, the American Medical Association now calls gen¬ 
der a "spectrum” rather than a binary, and as of 2018 it issued a firm statement that 
"sex and gender are more complex than previously assumed.” 39 But it's important to 
remember that there have always been more variations in gender identity and expres¬ 
sion than most Anglo-Western societies have cared to acknowledge or to collectively 
remember. This is evidenced in the range of regional and vernacular terms, such as 
kothi, hijra, and dhurani, that are currently used to describe the genders of people across 
South Asia that fall outside the binary; we see it in the additional umbrella terms, such 
as two-spirit, that describe people in some North American Indigenous communities; 
and many more. 40 Not to mention that some people are gender-fluid, meaning their 
gender identity may shift from day to day, year to year, or situation to situation. And 
yet—at least in a US context—gender data is still almost always collected in the binary 
categories of "male" and "female” and visually represented by some form of binary 
division as well. 41 This remains true even as a 2018 Stanford study found that, when 
given the choice among seven points on a gender spectrum, more than two-thirds of 
the subjects polled placed themselves somewhere in the middle. 42 

As survey designers, and data scientists more generally, there would seem to be an 
obvious response to the Stanford report: collect gender data in more than binary cat¬ 
egories, making sure to disaggregate the data—that is, compare the data by genders 
during the analysis phase. One recent alternative to the binary, developed by Public 
Health England in collaboration with LGBTQ+ organizations in the United Kingdom, 
is in evidence in figure 4.5. This two-item questionnaire was designed for use in rou¬ 
tine national surveillance of HIV in England and Wales to determine self-identified 
gender and cis or trans status in a public health context. The designers offer three 
named genders, a catch-all fourth category, and an option for not disclosing gender 
identity. In a separate question, they ask about gender at birth, again giving an option 
for not disclosing. The survey design uses sensitive wording and inclusive terminology 
to allow trans and genderqueer populations to be counted. These questions are being 
considered for expanded use across other national health records and data collection 
systems in the United Kingdom. 

Should all future gender data collection use this model? Not necessarily, and here's 
why: In a world in which quantification always leads to accurate representation, and 
accurate representation always leads to positive change, then always counting gender 
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How do you identify your gender? 

Woman 

(including trans woman) 

Man 

(including trans man) 

Is this the same gender you were assigned at birth? 

Yes No Prefer not to say 


Non-binary 
In another way 
Prefer not to say 


Figure 4.5 

From the Positive Voices survey of people living with HIV in England and Wales developed by 
Public Health England in collaboration with several partner organizations. This represents current 
best practices for collecting nonbinary gender data in an Anglo-Western public health context, 
but it's still important to recognize that different decisions might be warranted depending on the 
context. Courtesy of Peter Kirwin, Public Health England, 2018. 

identities outside the binary makes perfect sense. But being represented also means 
being made visible, and being made visible to the matrix of domination—which con¬ 
tinuously develops laws, practices, and cultural norms to police the gender binary— 
poses significant risks to the health and safety of minoritized groups. Under the current 
administration in the United States, for example, transgender people are banned from 
serving in the military and, once identified as such, denied access to certain forms 
of healthcare. 43 This demonstrates some of the risks of having one's gender counted 
as something other than man or woman—risks that can occur in many contexts, 
depending on what data are being collected, by whom, and whether they are person¬ 
ally identifiable (or easily deanonymized). It's also important to recognize how trans 
and nonbinary people may possibly be identified even within otherwise large datasets 
simply because there are fewer of them relative to the larger population. This possibil¬ 
ity poses additional risks, in the form of unwanted attention in the case of people who 
would prefer not to disclose their gender identity, or in the form of discrimination, 
violence, or even imprisonment, depending on the place they live. 

As data scientists, what should we do amid these potential harms? Depending on 
the circumstances and the institution that is doing the collecting, the most ethical 
decision can vary. It might be to avoid collecting data on whether someone is cis or 
transgender, to make all gender data optional, to not collect gender data at all, or even 
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to stick with binary gender categories. Social computation researcher Oliver Haimson 
has asserted that "in most non-health research, it's often not necessary to know par¬ 
ticipants' assigned gender at birth." 44 Heath Fogg Davis agrees: his book Beyond Trans 
argues that we don't need to classify people by sex on passports and licenses, for bath¬ 
rooms or sports, among other things. 45 By contrast, J. Nathan Matias, Sarah Szalavitz, 
and Ethan Zuckerman chose to keep gender data in binary form for their application 
FollowBias, which detects gender from names, in order to avoid making a person's gen¬ 
der identity public against their wishes. 46 

The ethical complexity of whether to count gender, when to count gender, and 
how to count gender illuminates the complexity of acts of classification against the 
backdrop of structural oppression. Because when it comes to data collection, and the 
categories that structure it, there are power imbalances up and down, side to side, and 
everywhere in between. Because of these asymmetries, data scientists must proceed 
with awareness of context (discussed further in chapter 6) and an analysis of power in 
the collection environment (discussed further in chapter 1) to determine whose inter¬ 
ests are being served by being counted, and who runs the risk of being harmed. 

Rethinking Binaries in Data Visualization 

A feminist critique of counting, and of the binary classification systems that often 
structure those acts, is not limited to a focus on gender alone. A binary logic also 
pervades our thinking about race, for example, as feminist scholars Brittney Cooper 
and Margaret Rhee explain. Drawing from ideas about intersectionality, they call for 
"hacking" the Black/white binary that, on the one hand, helps to expose the racism 
experienced by Black people in the United States and, on the other, erases the other 
forms of racism experienced by Indigenous as well as Latinx, Asian American, and 
other minoritized groups. "Binary racial discourses elide our struggles for justice,” they 
state plainly. 47 By challenging the binary thinking that erases the experiences of certain 
groups while elevating others, we can work toward more just and equitable data prac¬ 
tices and consequently toward a more just and equitable future. 

Sometimes, however, the goal of challenging binary thinking can be constrained 
by the realities of the field. Visualization designers, for example, do not typically have 
control over the collection practices of the data they are asked to visualize. They often 
inherit binary data that they then need to "hack" from within. What might this look 
like? We might point to the reporters on the Lifestyle Desk of the Telegraph, a British 
newspaper, who, in March 2018, were considering how to honor International Wom¬ 
en's Day and were struck by the significant gender gap in the United Kingdom in terms 
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of education, politics, business, and culture. 48 As journalists, they were working with 
multiple sources of data collected by other agencies, which all came in binary form. But 
they wanted to ensure that they didn’t further reinforce any gender stereotypes. They 
paid particular attention to color. One line of designer logic would favor cultural con¬ 
vention for interpretability, like using pink for women and blue for men, but a feminist 
line would use color choices to hack those same conventions (figure 4.6). 

Pink and blue is, after all, another hierarchy, and the goal of the Telegraph team 
members was to mitigate inequality, not reinforce it. So they took a different source 
for inspiration: the Votes for Women campaign of early twentieth-century England, in 
which purple was employed to represent freedom and dignity and green to represent 
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Just six of FTSE 100 CEOs are women 
while fewer than one in five Executive 
Committee positions across the FTSE 
350 are filled by women. Profit 
margins are almost double in 
companies with at least 25 per cent 
females on their Executive Committee 
compared to those with none 
according to The Pipeline. 


Figure 4.6 

"Born Equal. Treated Unequally" was an interactive feature in the Telegraph in 2018 that exam¬ 
ined the gender gap in the United Kingdom along a number of dimensions. Although the authors 
treated gender as a binary category, they used color to challenge stereotypically man/woman color 
coding. Feature by Claire Cohen, Patrick Scott, Ellie Kempster, Richard Moynihan, Oliver Edging- 
ton, Dario Verrengia, Fraser Lyness, George Ioakeimidis, and Jamie Johnson, for the Telegraph. 
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hope. When thinking about which of these colors to assign to each gender, they took 
a perceptual design principle as their guide: "Against white, purple registers with far 
greater contrast and so should attract more attention when putting alongside the green 
[sic], not by much but just enough to tip the scales. In a lot of the visualisations men 
largely outnumber women, so it was a fairly simple method of bringing them back into 
focus," Fraser Lyness, the Telegraph's director of graphic journalism told visualization 
designer Lisa Charlotte Rost. 49 Here, one hierarchy—the hierarchy in which colors are 
perceived by the eye—was employed to challenge another one: the hierarchy of gender. 
When put into practice, this simple method had the result of communicating clearly 
without reinforcing stereotypes. 

But the Telegraph journalists could have gone one step further to rethink binaries. 
They had an opportunity to communicate to the public that gender is not a binary 
by spelling that out—in the text of the story or in a caption under the graphics or 
by showing visually that there was no data for nonbinary people. Their colleagues at 
the Guardian recently adopted this latter strategy in their interactive piece "Does the 
New Congress Reflect You?" about the 2018 US midterm elections. 50 The piece presents 
three categories: cis male, cis female, and trans + nonbinary. When you click on "trans 
+ nonbinary," as in figure 4.7, the interactive map displays all of the districts in grey, 
because "0 people in Congress are like you." The absence of data becomes an important 
takeaway, as meaningful as the data themselves. 51 

These examples have shown gender as a dimension of analysis, but how might we 
visually represent gender itself? This is a challenge of visualizing complexity of the high¬ 
est degree, and Amanda Montanez, a designer for Scientific American, took this challenge 
head on (figure 4.8). She was tasked with creating an infographic to accompany an 
article on the evolving science of gender and sex—categories that she, like most people, 
viewed as distinct but related. 52 As she explains in a blog post on the Scientific American 
website, she first envisioned a simple spectrum, or perhaps two spectrums: one for sex 
and one for gender. 53 But she soon found confirmation of what we've been saying so far 
in this chapter: that few things in life can be truly reduced to binaries, and that insisting 
on binary categories of data collection—with respect to gender, to sex, to their relation, 
or to anything else—fails to acknowledge the value of what (or who) rests in between 
and outside. 

We have already established that gender is more than binary; but it's less commonly 
acknowledged that sex is more than a binary too. As feminist biologist Anne Fausto- 
Sterling confirms, "There is no single biological measure that unassailably places each 
and every human into one of two categories—male or female." 54 Intersex people, who 
constitute an estimated 1.7 percent of the population, may have ovaries and a penis, 
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Figure 4.7 

"Does the New Congress Reflect You?" is a 2018 interactive that appeared in the Guardian. Users 
select their own demographic characteristics to see how many people like them are in the 2018 
Congress. Clicking on "trans + nonbinary" leads to a blank map showing zero people in Congress 
like you. Image by Sam Morris, Juweek Adolphe, and Erum Salam for the Guardian. 


or "mosaic genetics" in which some of one's cells have XX chromosomes and some 
have XY. 55 It's also increasingly acknowledged that that sex, like gender, and some¬ 
times together with gender, is multilayered and continuously unfolding throughout a 
person's life. 

To begin to represent this complexity, Montanez had to begin by rejecting much of 
the data and research that she and her research assistant turned up, either on account 
of flawed categories or on account of flawed collection practices. She decided to focus 
on sex, and after an extensive design process, which included consulting with domain 
experts, Montanez and the design firm Pitch Interactive, which helped finalize the 
diagram, arrived at the result. Beyond XX and XY is a complex diagram, which employs 
a color spectrum to represent the sex spectrum, a vertical axis to represent change over 
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time, and branching arrows to connect to text blocks that provide additional con¬ 
textual information. The design offers a beautifully executed visual challenge to the 
scientifically incorrect idea that there are only two sexes, and even that the concepts of 
sex and gender are wholly distinct. Visualization is often thought of as a way to reduce 
complexity, but here it operates in the reverse—to push simple, oppressive ideas to be 
more complex, nuanced, and just. 

Refusing Data, Recovering Data 

Montanez's graphic made what was already counted count. In other words, she took 
what scientists and theorists knew to be true about the nature of sexual differentiation 
and made that knowledge more accessible and public. But counting in itself is not 
necessarily an unmitigated good, nor is putting it on public display. We have already 
introduced the idea of the paradox of exposure where people are harmed by being 
made visible to a system. But because system designers from dominant groups do not 
experience the harms of being counted or of being made visible without consent—this 
is the privilege hazard, once again—they rarely anticipate these needs or account for 
them in the design process. This is the reason that questions about counting must be 
accompanied by questions about consent, as well as of personal safety, cultural dignity, 
and historical context. 

It's Facebook, once again, that helps to prove this point. Information studies schol¬ 
ars Oliver Haimson and Anna Lauren Hoffman have studied the effects of the com¬ 
pany's "real name" policy, under which the platform determines each user's registered 
name to be either "real” and authentic or simply "fake." 56 (In our teacher voices, we 
now say: Does anyone note the problem with this binary thinking here?) Haimson and 
Hoffman point out that trans and queer people may choose to have multiple online 
identities, which may be fluid and contextual and possibly necessary to protect them¬ 
selves. As another example, abuse survivors may need to take steps to make themselves 
unfindable through search, even as they still want to be connected to their loved ones. 


Figure 4.8 (following two pages) 

Beyond XX and XY (2017) visualizes the known factors that contribute to sexual differentia¬ 
tion at different stages of human life, from conception to birth to puberty and beyond. Con¬ 
trary to received wisdom, sex is not a binary that is fixed at birth, but rather a layered and 
time-based process of differentiation, with more than two possible outcomes. Reproduced with 
permission. Copyright © 2017 Scientific American, a division of Nature America, Inc. All rights 
reserved. 
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BEYOND XX AND XY 

A host of factors figure into whether someone is female, male or somewhere in between 

Humans are socially conditioned to view sex and gender as binary attributes. From the moment we are born— 
or even before—we are definitively labeled “boy" or “girl.” Yet science points to a much more ambiguous reality. 
Determination of biological sex is staggeringly complex, involving not only anatomy but an intricate choreography 
of genetic and chemical factors that unfolds over time. Intersex individuals—those for whom sexual development 
follows an atypical trajectory—are characterized by a diverse range of conditions, such as 5-alpha reductase 
deficiency (circled). A small cross section of these conditions and the pathways they follow is shown here. In an 
additional layer of complexity, the gender with which a person identifies does not always align with the sex they* 
are assigned at birth, and they may not be wholly male or female. The more we learn about sex and gender, the 
more these attributes appear to exist on a spectrum. —Amanda Montanez 


FACTORS THAT 
DETERMINE SEX 


*The English language has long struggled with the lack of a widely recognized nongendered third-person singular pronoun. 

A singular form of "they" has grown in widespread acceptance, and many people who do not identify with a binary gender use it. 
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The Gender Spectrum 


A transgender woman is 
a person who was assigned 
male at birth based on her 
anatomy but who identifies 
as a woman. 

A cisgender woman is a 
person who was assigned 
female at birth based on 
her anatomy and who also 
identifies as a woman. 


A nonbinary person 
is someone who identifies 
as neither completely 
female nor completely 
male. Such an individual 
may identify with both 
genders or neither gender, 
or they may be gender 
fluid, meaning their gender 
fluctuates between female 
and male. 


A transgender man is 
a person who was assigned 
female at birth based on his 
anatomy but who identifies 
as a man. 

A cisgender man is a person 
who was assigned male at 
birth based on his anatomy 
and who also identifies 
as a man. 


Sexuality refers to an individual’s sexual orientation or to the kind of person to whom 
they are attracted. Sexuality is also a spectrum but is separate from both sex and gender. 
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Compounding the contextual nature of these factors, Facebook enforces its real 
name policy algorithmically—flagging names with "too many” words or with unusual 
capital letters. Haimson and Hoffman note that Facebook's algorithms disproportion¬ 
ately flag Native American names for violation because those names often differ in 
structure and form from Anglo-Western names (the subject position of the systems' 
designers, and therefore presumed to be the default; the privilege hazard once again). 
What's more, users can also report other users for not having real names, resulting in— 
for example—a single person systematically targeting several hundred drag queens' 
profiles for removal. Facebook claims that the real name policy exists for safety, but 
Haimson and Hoffman clearly show that the policy actively imperils the safety of some 
of the platform's most marginalized users. As we've already begun to suggest, some¬ 
times the most ethical thing to do is to help people be obscure, hidden, and invisible. 57 
The example of Facebook demonstrates the fundamental importance of obtaining con¬ 
sent when counting and of enabling individuals to refuse acts of counting and classifi¬ 
cation in light of potential harms. 

Acts of counting and classification, especially as they relate to minoritized groups, 
must always balance harms and benefits. When data are collected about real people 
and their lives, risks ranging from exposure to violence are always present. But when 
deliberately considered, and when consent is obtained, counting can contribute to 
efforts to increase valuable and desired visibility. The Colored Conventions Project 
(CCP), led by a team of students and faculty at the University of Delaware, demon¬ 
strates how to thoughtfully navigate this balance in the present by looking at the 
past. 58 Among the goals of the project is to create a machine-readable corpus of meet¬ 
ing minutes from the nineteenth-century Colored Conventions: events in which Black 
Americans, fugitive and free, gathered to strategize about how to achieve legal, social, 
economic, and educational justice. These meeting minutes are valuable because they 
have yet to be counted, so to speak, in the stories commonly told about the movement 
to end slavery in the nineteenth-century United States. Those stories tend to privilege 
the actions of white abolitionists because theirs were the stories that were recorded 
in print. But the Colored Conventions help to document the vital role of the Black 
activists who were working within their own communities to end slavery and achieve 
liberation. 

The creation of the corpus enables these important activists to be counted, as well 
as have their words (as recorded in the meeting minutes) analyzed and incorporated 
into the historical record. But the process of converting the meeting minutes into data 
strongly recalls the original violence that accompanied the slave trade, when human 
lives—in fact, the very ancestors of these activists—were reduced to numbers and 
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names. In recognition of this irreconcilable tension, the CCP requires that all those 
who download the corpus commit to a set of principles, including "a use of data that 
humanizes and acknowledges the Black people whose collective organizational histo¬ 
ries are assembled" in the corpus, and a request to "contextualize and narrate the con¬ 
ditions of the people who appear as 'data' and to name them when possible." 59 

There is a second tension that the CCP navigates in an exemplary fashion, which 
has to do with the content of the corpus itself. Because it is derived from the conven¬ 
tions' official meeting minutes, it records only the "official" participants in the con¬ 
ventions and the discussions they initiated. These participants were almost exclusively 
men. To address this disparity, the CCP team asks its teaching partners to sign a Memo 
of Understanding (MoU) before introducing students to the project. The MoU requests 
that all instructors introduce a woman involved in the conventions, such as a wife, 
daughter, sister, or fellow church member, alongside every male delegate who is named 
(figure 4.9). 60 From this work of recovery, the CCP is creating a second dataset of the 
women’s names—those who would otherwise go uncounted and therefore unrecog¬ 
nized for their work. They are using data collection to make these contributions count. 



Figure 4.9 

An engraving of an 1869 Colored Convention, published in Harper's Weekly, showing men at the 
podium and women seated and standing in the rear. Image courtesy of Jim Casey. 
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Counting as Healing, Counting as Accountability 

In the nineteenth century, as today, so many of the disparities introduced into data¬ 
sets had to do with much larger and much more profound asymmetries of power. The 
asymmetries are often directly reflected in the power dynamics between who is doing 
the counting and who is being counted. But when a community is counting for itself, 
about itself, there is the potential that data collection can be not only be empowering 
but also healing. One example of this that draws from the personal experience of one 
of the authors of this book. It was 2014, and Catherine was a student and nursing her 
baby daughter at the time, as well as struggling to pump breastmilk for her in unsavory 
places like server rooms and bathroom floors. Frustrated, she and six student colleagues 
came together to publish a call for ideas and stories that could help to improve breast 
pump technology. 61 These stories led to a research paper about breast pump design, as 
well as the creation of the Make the Breast Pump Not Suck Hackathon (figure 4.10)—an 



Figure 4.10 

The 2018 Make the Breast Pump Not Suck Hackathon was the second gathering of the community 
at MIT and focused on racial equity in breastfeeding, as well as shifting paid leave policy in the 
United States. Photo by Rebecca Rodriguez and Ken Richardson, MIT Media Lab. 
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ongoing forum for sharing stories, hacking pumps, and reengineering the postpartum 
ecosystem that surrounds them. 62 

Although innovation spaces had long been holding hackathons for health technol¬ 
ogy, the 2014 event was one of the first about birth and breastfeeding. As such, it led 
to participants sharing stories in a space that was (temporarily) free of the stigma sur¬ 
rounding breastfeeding. These stories pointed to common experiences and patterns in 
the spirit of "the personal is political" consciousness-raising events. Participants recog¬ 
nized these stories as data that could be used—and in fact were used—to demand more 
from breast pump makers, from workplaces, and from society, to help transform the 
self-blame that women often experience as a result of difficulties with birth and breast¬ 
feeding into collective political action. 63 

But action by whom, and action for whom? Following the 2014 event, we (meaning 
the organizers) reflected on its successes and its limitations—in particular, its lack of 
an intersectional approach. 64 In the United States, maternal health carries significant 
race and class inequities, as discussed in chapter 1. The first hackathon did not con¬ 
sider those inequities; it centered the needs of some the most privileged mothers and 
produced designs that favored their experiences. We decided to try again. In 2017 and 
2018, we multiplied the single event into a participatory research project, a policy sum¬ 
mit, and a community innovation program, as well as a hackathon. In all of these, we 
deliberately centered the needs and the participation of parents of color, low-income 
parents, and LGBTQ+ parents. When we arrived at the hackathon the second time 
around, it was the result of over a year of relationship building and identity work on 
the part of the organizers with our community partners. 

Ensuring that the 2018 hackathon would fully welcome the participation of these 
families required multiple forms of accountability. Guided by Jenn Roberts, our lead 
organizer for equity and inclusion, we wrote a values statement and convened an advi¬ 
sory board with leaders in breastfeeding, equity, and maternal health. We also devel¬ 
oped a set of metrics to shape the demographics of the event. 65 These metrics were 
designed to prioritize racial diversity, gender diversity, diversity of sexual orientation, 
geographic diversity, and domain diversity, with additional priority given for young 
people and newcomers. On the application form, potential participants were encour¬ 
aged to self-identify their gender and race, specify their location, and choose mul¬ 
tiple options from a list of predefined domain expertise categories (like "parent" or 
"designer/artist”). We also invited them to write about why they wanted to attend, and 
if they chose to disclose information about their sexual orientation or their financial 
position, then we considered that information in the process. 
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Were these categories reductive? Of course they were. No person can fit their whole 
self into a form, regardless of how many blank text fields are provided. Did the form 
reflect the true nature of each person's intersecting identities and how those identities 
impact that person's being in the world? The answer to this question is also unsurpris¬ 
ing: of course it did not. But the process of collecting this demographic data—which 
was, crucially, undertaken voluntarily and from within the community itself—resulted 
in an event that was indeed guided by the knowledge and experience of the groups that 
our coalition had hoped to center. 66 

Catherine shared this experience with Lauren as we were beginning to draft this 
book, and we decided to use a similar process to help hold ourselves accountable to 
the values that we wanted to inform Data Feminism and the criteria by which certain 
projects and texts would be selected for inclusion. We determined specific numbers and 
percentages that, in our view, would help keep us accountable to those values, as well as 
the categories of data collection that would be required to determine whether the met¬ 
rics had been met. (These are viewable in the appendix, Our Values and Our Metrics for 
Holding Ourselves Accountable.) At two phases in the process—first when we posted 
the draft of the manuscript online, and second after we submitted the manuscript for 
copyediting—one of our research assistants, Isabel Carter, audited the projects and cita¬ 
tions of the book. (They describe their research methods in more detail in "Auditing 
Data Feminism," included as another appendix.) As with the hackathon, these metrics 
were not the only method we employed for holding ourselves accountable. We also 
interviewed the creators of many of the projects we reference, cleared our quotes and 
portrayals of their work with them, and published a draft of the book online for open 
peer review, among other approaches. 

Was our method of counting perfect? Of course not. We are certain we have made 
mistakes. This is among the reasons that we decided to keep our disaggregated data 
private, even as we published the aggregated results. What about the idea to count 
people and projects in the first place? Shouldn't that be viewed as contributing to the 
same reduction in complexity that we have argued against thus far in this book? As this 
chapter has demonstrated, counting is always complicated. But undertaken deliber¬ 
ately, tailored to specific goals, and with issues of privacy and potential harms always in 
mind, counting can be used to support accountability—as one method, among many, 
of working toward a larger goal. 

Rethink Binaries and Hierarchies 

Counting and classification can be powerful parts of the process of creating knowledge. 
But they're also tools of power in themselves. Historically, counting and classification 
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have been used to dominate, discipline, and exclude. This is where the fourth principle 
of data feminism, rethink binaries and hierarchies, enters in. The gender binary offers a 
key example of how classification systems are constructed by cultures and societies and 
reflect both their values and their biases. The cases of the TSA airport scanners, Face- 
book user profiles, and plain old pants show us how gender and sex binaries—along 
with scientifically incorrect understandings of both gender and sex—get encoded into 
technical systems (and also jeans)! Those systems, in turn, recirculate erroneous and 
harmful ideas. 

An intersectional feminist approach to counting insists that we examine and, if 
necessary, rethink the assumptions and beliefs behind our classification infrastructure, 
as well as consistently probe who is doing the counting and whose interests are served. 
Counting and measuring do not always have to be tools of oppression. We can also use 
them to hold power accountable, to reclaim overlooked histories, and to build collec¬ 
tivity and solidarity. When we count within our own communities, with consideration 
and care, we can work to rebalance unequal distributions of power. 
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Principle: Embrace Pluralism 

Data feminism insists that the most complete knowledge comes pom synthesizing multiple 
perspectives, with priority given to local, Indigenous, and experiential ways of knowing. 


In Spring 2017, Bloomberg News ran an article with the provocative title "America's 
Rich Get Richer and the Poor Get Replaced by Robots." 1 Using census data, the authors 
reported that income inequality is widening across the nation. San Francisco is leading 
the pack, with an income gap of almost half a million dollars between the richest and 
the poorest twenty percent of residents. As in other places, the wealth gap has race and 
gender dimensions. In the Bay Area, people of color earn sixty-eight cents for every 
dollar earned by white people, and 59 percent of single mothers live in poverty. 2 San 
Francisco also has the lowest proportion of children to adults in any major US city, 
and—since 2003—an escalating rate of evictions. 

Although the San Francisco Rent Board collects data on these evictions, it does not 
track where people go after they are evicted, how many of those people end up home¬ 
less, or which landlords are responsible for systematically evicting major blocks of the 
city. In 2013, the Anti-Eviction Mapping Project (AEMP) stepped in. The initiative is 
a self-described collective of "housing justice activists, researchers, data nerds, artists, 
and oral historians." It is a multiracial group with significant, though not exclusive, 
project leadership by women. The AEMP is mapping eviction, and doing so through a 
collaborative, multimodal, and—yes—quite messy process. 

If you visit antievictionmap.com, you won't actually find a single map. There 
are seventy-eight distinct maps linked from the homepage: maps of displaced resi¬ 
dents, of evictions, of tech buses, of property owners, of the Filipino diaspora, of the 
declining numbers of Black residents in the city, and more. The AEMP has a distinct 
way of working that is grounded in its stated commitment to antiracist, feminist, 
and decolonial methodologies. 3 Most of the projects happen in collaboration with 
nonprofits and community-based organizations. Additional projects originate from 
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within the collective. For example, the group is working on producing an atlas 
of the Bay Area called Counterpoints: Bay Area Data and Stories for Resisting Displace¬ 
ment, which will cover topics such as migration and relocation, gentrification and the 
prison pipeline, Indigenous and colonial histories of the region, and speculation about 
the future. 4 

One of the AEMP's longest-standing collaborations is with the Eviction Defense Col¬ 
laborative (EDC), a nonprofit that provides court representation for people who have 
been evicted. Although the city does not collect data on the race or income of evictees, 
the EDC does collect those demographics, and it works with 90 percent of the tenants 
whose eviction cases end up in San Francisco courts. 5 In 2014, the EDC approached 
the AEMP to help produce its annual report and in return offered to share its demo¬ 
graphic data with the organization. 6 Since then, the two groups have continued to 
work together on EDC annual reports, as well as additional analyses of evictions with 
a focus on race. The AEMP has also gone on to produce reports with tenants' rights 
organizations, timelines of gentrification with Indigenous students, oral histories with 
grants from anthropology departments, and murals with arts organizations, as well as 
maps and more maps. 

Some of the AEMP's maps are designed to leverage the ability of data visualization 
to make patterns visible at a glance. For example, the Tech Bus Stop Eviction Map pro¬ 
duced in 2014 plots the locations of three years of Ellis Act evictions. 7 This is a form of 
"no-fault" eviction in which landlords can claim that they are going out of the rental 
business. In many cases, this is so that they can convert the building to a condominium 
and sell the units at significant profit. San Francisco has seen almost five thousand 
invocations of the Ellis Act since 1994. On the map (figure 5.1), the AEMP plotted Ellis 
Act evictions in relationship to the location of technology company bus stops. Starting 
in the 2000s, tech companies with campuses in Silicon Valley began offering private 
luxury buses as a perk to attract employees who wanted to live in downtown San Fran¬ 
cisco but didn't want the hassle of commuting. Colloquially known as "the Google 
buses" because Google was the most visible company to implement the practice, these 
vehicles use public bus stops—illegally at first—to shuttle employees to their offices in 
comfort (and also away from the public transportation system, which would otherwise 
reap the benefits of their fares). 8 Because a new, wealthy clientele began to seek condos 
near the bus stops, property values soared—and so did the rate of evictions of long-time 
neighborhood residents. The AEMP analysis showed that, between 2011 and 2013, 69 
percent of no-fault evictions occurred within four blocks of a tech bus stop. The map 
makes that finding plain. 
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Figure 5.1 

Tech Bus Stop Eviction Map by the Anti-Eviction Mapping Project (2014) plots evictions with 
"Google bus stops" in San Francisco. The group’s analysis showed that 69 percent of no-fault 
evictions in the city occurred within four blocks of a tech bus stop. Courtesy of the Anti-Eviction 
Mapping Project. 
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But other AEMP maps are intentionally designed not to depict a clear correlation 
between evictions and place. In Narratives of Displacement and Resistance (figure 5.2a), 
five thousand evictions are each represented as a differently sized red bubble, so the 
base map of San Francisco is barely visible underneath. 9 On top of this red sea, sky-blue 
bubbles dot the locations where the AEMP has conducted interviews with displaced 
residents, as well as activists, mediamakers, and local historians. Clicking on a blue 
bubble sets one of the dozens of oral histories in motion—for instance, the story of 
Phyllis Bowie (figure 5.2b), a resident facing eviction from her one-bedroom apart¬ 
ment. "I was born and raised in San Francisco proudly," she begins. Bowie goes on to 
recall how she returned from the Air Force and worked like crazy for two years at her 
own small business, building up an income record that would make her eligible for a 
lease-to-own apartment in Midtown, the historically Black neighborhood where she 
had grown up. In 2015, however, the city broke the master lease and took away the 
rent control on her building. Now tenants like Bowie, who moved there with the prom¬ 
ise of homeownership, are facing skyrocketing rents that none of them can afford. 
Bowie is leading rent strikes and organizing the building's tenants, but their future 
is uncertain. 

This uncertainty is carried over into the design of the map, which uses a phenome¬ 
non that is often discouraged in data visualization— occlusion —to drive home its point. 
Occlusion typically refers to the "problem” that occurs when some marks (like the evic¬ 
tion dots) obscure other important features (like the whole geography of the city). But 
here it underscores the point that there are very few patterns to detect when the entire 
city is covered in big red eviction bubbles and abundant blue story dots. Put another 
way, the whole city is a pattern, and that pattern is the problem—much more than a 
problem of information design. 

In this way, the Narratives map enacts dissent from San Francisco city policies in the 
same way that it enacts dissent from the conventions of information design. It refuses 
both the clarity and cleanliness associated with the best practices of data visualization 
and the homogenizing and "cleanliness" associated with the forces of gentrification 
that lead to evictions in the first place. 10 The visual point of the map is simple and 
exhortative: there are too many evictions. And there are too many eviction stories. The 
map does not efficiently reveal how evictions data may be correlated with Bay Area 
Rapid Transit (BART) stops, income, Google bus stops, or any other potential dimen¬ 
sions of the data. Even finding the Narratives map is difficult, given the sheer number 
of maps and visualizations on the AEMP website. There is no "master dashboard" that 
integrates all the information that the AEMP has collected into a single interface. 11 



Narratives of Displacement and Resistance, Anti-Eviction Mapping Project 


Please do not use any of these stories for your own work 
without first checking with us. Interviewees have only 
authorized the release of their stories to the AEMP. 


Red eviction markers represent evictions, and blue 
circles contain oral histories. 





Figure 5.2 

Narratives of Displacement and Resistance (2018) by the Anti-Eviction Mapping Project (a), includ¬ 
ing a detail view from Phyllis Bowie's story (b). Courtesy of the Anti-Eviction Mapping Project. 
Made in collaboration with the San Francisco Ruth Assawa School of the Arts. The interview was 
shot by Marianne Maeckelbergh and Brandon Jourdan and edited by students Shilo Arkinson and 
Avidan Novogrodsky-Godt, and facilitated by Alexandra Lacey and Jin Zhu. 
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But all these design choices reinforce the main purpose of the AEMP: to document the 
effects of displacement and to resist it through critical and creative means. 

The AEMP thus offers a rich set of examples of feminist counterdata collection and 
countervisualization strategies. Individually and together, the maps illustrate how 
Katie Lloyd Thomas, founding member of the feminist art architecture collective taking 
place, envisions "partiality and contestation" taking place in graphical design. "Rather 
than tell one 'true' story of consensus, [the drawing/graphic] might remember and 
acknowledge multiple, even contradictory versions of reality," she explains. 12 The Nar¬ 
ratives map populates its landscape with these contradictory realities. It demonstrates 
that behind each eviction is a person—a person like Bowie, with a unique voice and a 
unique story. The voices we hear are diverse, multiple, specific, divergent—and delib¬ 
erately so. 

In so doing, the Narratives map, and the AEMP more generally, exemplify the fourth 
principle of data feminism and the theme of this chapter: embrace pluralism. Embracing 
pluralism in data science means valuing many perspectives and voices and doing so at 
all stages of the process—from collection to cleaning to analysis to communication. 
It also means attending to the ways in which data science methods can inadvertently 
work to suppress those voices in the service of clarity, cleanliness, and control. Many 
of our received ideas about data science work against pluralistic meaning-making proc¬ 
esses, and one goal of data feminism is to change that. 

Strangers in the Dataset 

Data cleaning holds a special kind of mythos in data science. "It is often said that 80% 
of data analysis is spent on the process of cleaning and preparing the data," writes 
Hadley Wickham in the first sentence of the abstract of "Tidy Data," his widely cited 
paper from 2014. 13 Wickham is also the author of the tidyr package for R, a popular 
programming language and statistical computing platform. (Packages are prewritten 
collections of functions, and other forms of code, that can be installed and used in any 
project.) Wickham's package shares the sentiment expressed in his paper: that data are 
inherently messy and need to be tidied and tamed. 

Wickham is not alone in this belief. Since the release of his tidyr package, Wickham 
has gone on to create the "tidyverse," an expanded set of packages that formalize a 
clear workflow for processing data, which have been developed by a team of equally 
enthusiastic contributors. Articles in the popular and business press corroborate this 
insistence on tidiness, as well as its pressing need. In Harvard Business Review, the work 
of the data scientist is glorified in terms of this tidying function: "At ease in the digital 
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realm, they are able to bring structure to large quantities of formless data and make 
analysis possible." 14 Here, the intrepid analyst wrangles orderly tables from unstruc¬ 
tured chaos. According to the article, it's "the sexiest job of the 21st century." 15 But for 
the New York Times, the data analyst's work is far less attractive. A 2014 article equated 
the task of cleaning data to the low-wage maintenance work of "janitors.” 16 

Whether or not you think data scientists are sexy (they are), and whether or not you 
think janitors should be offended by this classist reference (we all should be), certain 
assumptions and anxieties remain consistent across these different articulations about 
the need for tidiness, cleanliness, and order, and the qualities of the people who should 
be doing this work. They must be able to tame the chaos of information overload. They 
must "scrub" and "cleanse" dirty data. And they must undertake deliberate action— 
either extraordinary or mundane—to put data back in their proper place. 

But what might be lost in the process of dominating and disciplining data? Whose 
perspectives might be lost in that process? And, conversely, whose perspectives might 
be additionally imposed? The ideas expressed by Wickham, and by the press, carry the 
assumption that all data scientists, in all contexts, value cleanliness and control over 
messiness and complexity. But as the example of the AEMP demonstrates, these are not 
the requirements, nor the goals, of all data projects. 

Before moving forward, we find it important to acknowledge that these ideas about 
cleanliness and control contain troubling traces of a movement from a prior era: eugen¬ 
ics, the upsetting, nineteenth-century source of much of modern statistics. As we have 
referenced in previous chapters via the work of Dean Spade and Rori Rohlfs, many of 
the men often cited as the earliest statistical luminaries, such as Karl Pearson, Adolphe 
Quetelet, Francis Galton, and Ronald Fisher, were also leaders in the eugenics move¬ 
ment. 17 In her book Ghost Stories for Darwin: The Science of Variation and the Politics of 
Diversity, postcolonial science studies scholar Banu Subramaniam details how over the 
course of the late nineteenth and early twentieth centuries, as statistics became the 
preferred language of communication between biologists and social scientists, certain 
ideas from the eugenics movement also carried over into this broader scientific con¬ 
versation. 18 While the most odious aspects of these ideas have been largely (and thank¬ 
fully) stripped away, certain core principles—like a generalized belief in the benefit of 
control and cleanliness—remain. 19 To be clear: the point here is not that anyone who 
cleans their data is perpetuating eugenics. 20 The point, rather, is that the ideas underly¬ 
ing the belief that data should always be clean and controlled have tainted historical 
roots. As data scientists, we cannot forget these roots, even as the ideas themselves have 
been tidied up over time. 
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This is the long history that library and information studies scholars Katie Raw- 
son and Trevor Munoz likely allude to in their assertion that "the cleaning paradigm 
assumes an underlying, 'correct' order.” Like Subramaniam, but in the context of clean¬ 
ing data more specifically, Rawson and Munoz caution that cleaning can function as 
a "diversity-hiding trick." 21 In the perceived messiness of data, there is actually rich 
information about the circumstances under which it was collected. Data studies scholar 
Yanni Loukissas concurs. Rather than talking about datasets, he advocates that we talk 
about data settings —his term to describe both the technical and the human processes 
that affect what information is captured in the data collection process and how the 
data are then structured. 22 

As an example of how the data setting matters, Loukissas tells the story of being at a 
hackathon in Cambridge, Massachusetts, where he began to explore a dataset from the 
Clemson University Library, located in South Carolina. He stumbled across a puzzling 
record in which the librarian had noted the location of the item as "upstate." Such a 
designation is, of course, relational to the place of collection. For South Carolinians, 
upstate is a very legible term that refers to the westernmost region of the state, where 
Clemson is located. But it does not hold the same meaning for a person from New York, 
where upstate refers to its own northern region, nor does it hold the same meaning for 
a person sitting at a hackathon in Massachusetts, which does not have an upstate part 
of the state. Had someone at the hackathon written the entry from where they sat, 
they might have chosen to list the ten or so counties that South Carolinians recognize 
as upstate, so as to be more clearly legible to a wider geographic audience. But there is 
meaning conveyed by the term that would not be conveyed by other, more general 
ways of indicating the same region. Only somebody already located in South Carolina 
would have referred to that region in that way. From that usage of the term, we can 
reason that the data were originally collected in South Carolina. This information is 
not included elsewhere in the library record. 

It is because of records like this one that the process of cleaning and tidying data can 
be so complicated and, at times, can be a destructive rather than constructive act. One 
way to think of it is like chopping off the roots of a tree that connects it to the ground 
from which it grew. It's an act that irreversibly separates the data from their context. 

We might relate the growth of tools like tidyr that help to trim and tidy data to 
another human intervention into the environment: the proliferation of street names 
and signs that transformed the landscape of the nineteenth-century United States. 
Geographer Reuben Rose-Redwood describes how, for example, prior to the Revolu¬ 
tionary War, very few streets in Manhattan had signs posted at intersections. 23 Street 
names, such as they existed, were vernacular and related to the particularity of a 
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spot—for example, "Take a right at the red house." But with the increased mobility of 
people and things—think of the postal system, the railroads, or the telegraph—street 
names needed to become systematized. Rose-Redwood calls this the production of "leg¬ 
ible urban spaces." Then, as now, there is high economic value to legible urban spaces, 
particularly for large corporations (we're looking at you, Amazon) to deliver boxes of 
anything and everything directly to people's front doors. 24 

The point here is that one does not need street names for navigation until one has strang¬ 
ers in the landscape. Likewise, data do not need cleaning until there are strangers in the 
dataset. The Clemson University Library dataset was perfectly clear to Clemson's own 
librarians. But once hackers in Cambridge get their hands on it, upstate started to make 
a lot less sense, and it was not at all helpful in producing, for instance, a map of all 
the library records in the United States. Put more generally, once the data scientists 
involved in a project are not from within the community, once the place of analysis 
changes, once the scale of the project shifts, or once a single dataset needs to be com¬ 
bined with others—then we have strangers in the dataset. 

Who are these strangers? As we've already started to suggest, people who work with 
data are alternately called unicorns (because they are rare and have special skills), wiz¬ 
ards (because they can do magic), ninjas (because they execute complicated, expert 
moves), rock stars (because they outperform others), and janitors (because they clean 
messy data) (figure 5.3). Amazon dropped the "janitor" part in a recent job ad, but it 
managed to work in a few of these metaphors: "Amazon needs a rockstar engineer ... 
You are passionate ... You succeed fearlessly ... You are a coding ninja.'' 25 

These rock stars and ninjas are strangers in the dataset because, like the hackers in 
Cambridge, they often sit at one, two, or many levels removed from the collection and 
maintenance process of the data that they work with. This is a negative externality —an 
inadvertent third-party consequence—that arises when working with open data, appli¬ 
cation programming interfaces (APIs), and the vast stores of training data available 
online. These data appear available and ready to mobilize, but what they represent is 
not always well-documented or easily understood by outsiders. Being a stranger in the 
dataset is not an inherently bad thing, but it carries significant risk of what renowned 
postcolonial scholar Gayatri Spivak calls epistemic violence —the harm that dominant 
groups like colonial powers wreak by privileging their ways of knowing over local and 
Indigenous ways. 26 

This problem is compounded by the belief that data work is a solitary undertaking. 
This is reflected in the fact that unicorns are unique by definition, and wizards, ninjas, 
and rock stars are also all people who seem to work alone. This is a fallacy, of course; 
every rock star requires a backing band, and if we've learned anything from Harry 


134 


Chapter 5 


Data Scientists as Unicorns, Wizards, Ninjas, Rock Stars and Janitors 
Mentions in the Media, 2012-2018 



Figure 5.3 

Searching Media Cloud between 2012 and 2018 shows that unicorn is the most commonly refer¬ 
enced metaphor in relation to data scientists, with wizard a close second. There are fewer than fifty 
articles about data ninjas, rock stars, and janitors, but they appear in high-profile venues like the 
Washington Post and Forbes. The Media Cloud platform at www.mediadoud.org was developed at 
the MIT Center for Civic Media and archives just under a million articles and blog posts every day. 
Graphic by Catherine D'lgnazio. Data from www.mediacloud.org. 


Potter, it's that any particular tap of the wand is the culmination of years of education, 
training, and support. Wizards, ninjas, rock stars, and janitors each have something 
else in common: they are assumed to be men. 27 If you doubt this assertion, try doing a 
Google image search and count how many male-presenting wizards and janitors you 
see before you get to a single female-presenting one. Or consider why news articles 
about "data janitors" don't describe them as doing "cleaning lady" work? Like janito¬ 
rial work, which is disproportionately undertaken by working-class people of color, the 
idea of the data ninja also carries racist connotations. 28 And there's even more: shared 
among the first four terms—unicorns, wizards, ninjas, rock stars—is a focus on the 
individual's extraordinary technical expertise and their ability to prevail when others 
cannot. We might have more accurately said "his ability to prevail" because these ideas 
about individual mastery and prevailing against steep odds are, of course, also associ¬ 
ated with men. 

There is a "genius” in the world of eviction data—it is Matthew Desmond, des¬ 
ignated as such by the MacArthur Foundation for his work on poverty and eviction 
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in the United States. He is a professor and director of the Eviction Lab at Princeton 
University, which has been working for several years to compile a national database 
of evictions and make it available to the general public. Although the federal gov¬ 
ernment collects national data on foreclosures, there is no national database of evic¬ 
tions, something that is desperately needed for the many communities where housing 
is in crisis. 

Initially, the Eviction Lab had approached community organizations like the AEMP 
to request their data. The AEMP wanted to know more—about privacy protections and 
how the Eviction Lab would keep the data from falling into landlord hands. Instead of 
continuing the conversation, the Eviction Lab turned to a real estate data broker and 
purchased data of lower quality. But in an article written by people affiliated with the 
AEMP and other housing justice organizations, the authors state, "AEMP and Tenants 
Together have found three-times the amount of evictions in California as Desmond's 
Eviction Lab show." 29 

As Desmond tells it, this decision was due to a change in the lab's data collection 
strategy. "We're a research lab, so one thing that's important to us is the data cleaning 
process. If you want to know does Chicago evict more people than Boston, you've got 
to compare apples to apples.” 30 In Desmond's view, it is more methodologically sound 
to compare datasets that have been already aggregated and standardized. And yet Des¬ 
mond acknowledges that Eviction Lab's data for the state of California is undercount¬ 
ing evictions; there is even a message on the site that makes that explicit. So here's an 
ethical quandary: Does one choose cleaner data at a larger scale that is relatively easy 
and quick to purchase? Or more accurate data at a local scale for which one has to 
engage and build trust with community groups? 

In this case, the priority was placed on speed at the expense of establishing trusted 
relationships with actors on the ground, and on broad national coverage at the expense 
of local accuracy. Though the Eviction Lab is doing important work, continued deci¬ 
sions that prioritize speed and comprehensiveness can't help but maintain the cultural 
status of the solitary "genius," effectively downplaying the work of coalitions, com¬ 
munities, and movements that are—not coincidentally—often led primarily by women 
and people of color. 

What might be gained if we not only recognized but also valued the fact that 
data work involves multiple voices and multiple types of expertise? What if produc¬ 
ing new social relationships—increasing community solidarity and enhancing social 
cohesion—was valued (and funded) as much as acquiring data? We think this would 
lead to a multiplication of projects like the AEMP: projects that do demonstrable good 
with data and do so together with the communities they seek to support. 
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On Power, Pluralism, and Process 

Although the Anti-Eviction Mapping Project could have handed off its valuable data 
to a single mapmaking rock star-unicorn-ninja-wizard-janitor, the group made an 
intentional decision to include many designers in the process, including many nonex¬ 
perts who experienced the power of making maps for the first time. In addition to the 
diverse array of data products that resulted, which reflected the diverse voices of the 
AEMP's various collaborators, this decision also had the (wholly intentional) result of 
building technical capacity. Slowly and surely, map by map, collaboration by collabo¬ 
ration, local residents strengthened their mapping skills and their relationships with 
each other. This latter result too was intentional; one of the stated goals of the AEMP 
is to "build solidarity and collectivity among the project's participants who could help 
one another in fighting evictions and collectively combat the alienation that eviction 
produces." 31 

This goal reflects a key tenet of feminist thinking, which is the recognition that a 
multiplicity of voices, rather than one single loud or technical or magical one, results 
in a more complete picture of the issue at hand. Feminist philosophers like Donna 
Haraway, who we introduced in chapter 3, prompted a wave of thinkers who have 
continued to develop the idea that all knowledge is partial, meaning no single person 
or group can claim an objective view of the capital-T Truth. 32 But embracing pluralism, 
as this concept is often described today, does not mean that everything is relative, 
nor does it mean that all truth claims have equal weight. And it most certainly does 
not mean that feminists do not believe in science. It simply means that when people 
make knowledge, they do so from a particular standpoint: from a situated, embodied 
location in the world. More than that, by pooling our standpoints—or positionalities— 
together, we can arrive at a richer and more robust understanding of the world. 33 

So, how do we begin down the path to this deeper understanding in data science? 
The first step in activating the value of multiple perspectives is to acknowledge the 
partiality of your own. This means disclosing your project's methods, your decisions, 
and—importantly for work that strives to address injustice—your own positionalities. 
This is called reflexivity, and we modeled this in the introduction to this book. You may 
have heard the phrase coined by David Weinberger, "transparency is the new objec¬ 
tivity." 34 We take this to mean that there is a way to build space for transparency plus 
reflexivity in data science, rather than undertaking projects that purport to be objec¬ 
tive (but, as we've discussed, never really are). Transparency and reflexivity allow the 
people involved in any particular project to be explicit about the methods behind their 
project, as well as their own identities. 
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People in journalism have been doing this for some time—at least as it relates to 
their data and methods. For example, Bloomberg's interactive visualization "What's 
Really Warming the World?" (figure 5.4) walks the reader through a range of common 
arguments that try to explain away global warming with reasons that don't have to do 
with human industry or behavior. 35 It's a compelling piece in terms of content alone, 
but another interesting thing about it is that it devotes nearly a third of its screen real 
estate to describing its data and methods. 

Providing access to data and describing the methods employed are becoming con¬ 
ventions in data journalism, just as they are in scientific fields. Although these accounts 
are presently focused on technical details—where the data were from, what statistical 
models were developed, and what analysis was performed—there is a seed of possibil¬ 
ity for using this same space to reveal additional details: Who was on the team? What 
were points of tension and disagreement? When did the team need to talk to data 
stewards or domain experts or local communities? Which hypotheses were pursued 
but ultimately proved false? There is a story about how every evidence-based argument 
comes into being, and it is often a story that involves money and institutions, as well 
as humans and tools. Revealing this story through transparency and reflexivity can be 
a feminist act. 

Reflexivity can sometimes be as simple as being explicit about or even visualizing 
who is doing the counting and mapping behind the scenes. 36 Take the example of the 
aerial-mapping image in figure 5.5. 37 The Public Laboratory for Open Technology and 
Science (Public Lab) is a citizen science group that got its start during the BP oil spill in 
2010. 38 It makes high-resolution aerial maps by flying balloons and kites, which dangle 
cheap digital cameras over the environmental sites they seek to study. The technique 
is low cost, but the imagery produced is often higher resolution than existing satellite 
imagery because of the proximity to the ground. As in the image, the mappers them¬ 
selves are often visible in the final product in the form of little bodies, gathered in 
boats or standing in clumps on a shoreline, looking up at the camera above them. The 
balloon string leads the eye back to their forms. Here the creators are not distanced or 
absent but represented in the final product. Literally. 

Data for "Good" versus Data for Co-liberation 

Embracing the value of multiple perspectives shouldn't stop with transparency and 
reflexivity. It also means actively and deliberately inviting other perspectives into the 
data analysis and storytelling process—more specifically, those of the people most mar¬ 
ginalized in any given context. Intersectional feminist scholars have long insisted that 




30% method & 
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Figure 5.4 

"What's Really Warming the World?,'' published in 2015, devotes a third of its real estate to 
describing the methods for how the authors worked with the data. Graphic by Catherine D'Ignazio, 
based on reporting by Eric Roston and Blacki Migliozzi for Bloomberg Businessweek. 
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Figure 5.5 

From a Public Lab research note by Eymund Diegel about mapping sewage flows in the Gowanus 
Canal in 2012 after Hurricane Sandy. Note the people on boats doing the mapping and the bal¬ 
loon tether that links the camera and image back to their bodies. Courtesy of Eymund Diegel for 
Public Lab. 


we should be creating new knowledge and new designs from the margins. "Marginal¬ 
ized subjects have an epistemic advantage, a particular perspective that scholars should 
consider, if not adopt, when crafting a normative vision of a just society," as Black 
feminist scholar Jennifer C. Nash explains . 39 

What does this mean? From a gender perspective, it means beginning with the per¬ 
spectives of women and nonbinary people. On a project that involves international 
development data, it means beginning not with institutional goals but with Indig¬ 
enous standpoints. For the AEMP, it means centering the voices and experiences of 
those who have been evicted. Follow-up work about designing from the margins argues 
that designers and engineers shouldn't only be engaging people at the margins but also 
actively working to dismantle the center/margins distinction in the first place . 40 More 
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recently, the Design Justice Network has transformed this key tenet of intersectional 
thinking into one of its design principles, stating: "We center the voices of those who 
are directly impacted by the outcomes of the design process." 41 

How might this work? To begin, it requires a design process in which many actors 
can participate—people with technical expertise, as well as those with lived exper¬ 
tise, domain expertise, organizing expertise, and community history expertise. It also 
means shifting the overarching goal of such projects from "doing good with data" to 
designing for co-liberation; remember from chapter 2 that this is an end state in which 
people from dominant groups and minoritized groups work together to free themselves 
from oppressive systems. The key differences between data for good and data for co¬ 
liberation are highlighted in table 5.1. 

Data for good is a frame that is increasingly employed to describe data science proj¬ 
ects that are socially engaged and/or undertaken in the public interest. The Bloomberg 
corporation has been sponsoring Data for Good conferences since 2014. Nonprofit 
consulting groups like Delta Analytics have sprouted up to match volunteers with 
technical expertise with mission-driven organizations. In 2019, one such organiza¬ 
tion, DataKind, received a $20 million gift from a funding collaborative called Data 
Science for Social Impact, with monies contributed by the Rockefeller Foundation 


Table 5.1 

Features of "data for good" versus data for co-liberation 



"Data for good" 

Data for co-liberation 

Leadership by members of minoritized groups 
working in community 


V 

Money and resources managed by members of 
minoritized groups 


V 

Data owned and governed by the community 


V 

Quantitative data analysis "ground truthed" 
through a participatory, community-centered 
data analysis process 


V 

Data scientists are not rock stars and wizards, 
but rather facilitators and guides 


V 

Data education and knowledge transfer are part 
of the project design 


V 

Building social infrastructure—community 
solidarity and shared understanding—is part of 
the project design 


V 
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and Mastercard. 42 There are also educational experiments underway, like the Utrecht 
Data School and the University of Chicago's Data Science for Social Good summer 
fellowship program. In the latter, aspiring data scientists work with governments and 
nonprofit organizations to address problems in diverse domains such as education, 
health, public safety, and economic development. 43 Related efforts in artificial intel¬ 
ligence have sprung up, like the AI4Good Foundation, Project Impact sponsored by 
Intel Corporation, the women-centered AI Summer Lab at McGill University—the 
list goes on. 

These efforts have had demonstrable social impact. And yet, there remains a nag¬ 
ging fuzziness with respect to what it means to "do good." Whose good are we talking 
about? What are the terms? Who maintains the databases when the unicorn-wizards 
leave the community? And who pays for the cloud storage when the development por¬ 
tion of the project is complete? 

These issues are beginning to be discussed within the data for good community. Sara 
Hooker, a deep-learning researcher at Google Brain and the founder of Delta Analytics, 
has observed that the idea of "data for good" lacks precision. 44 To contribute clarity 
to the phrase, Hooker proposes a rough taxonomy of this type of work, identifying 
four distinct flavors: (1) volunteering skilled labor, (2) donating tech, (3) working with 
nonprofits or governments as partners, and/or (4) running data-education programs in 
underserved communities. 45 In each of these areas, there remain some thorny issues: 
the fickleness of volunteer labor, the fact that even committed volunteers often lack 
local knowledge, the community's capacity (or lack thereof) to perform and/or pay for 
its own technical maintenance, the fact that work initiated by nonprofits and govern¬ 
ments cannot always be assumed to be "good," and so on. Hooker's point is that when 
the goal is a vague notion of "good,” there is no way to address such concerns. 

By contrast, a model that positions co-liberation as the end goal leads to a very specific 
set of processes and practices, as well as criteria for success. Co-liberation is grounded in 
the belief that enduring and asymmetrical power relations among social groups serve as 
the root cause of many societal problems. Rather than framing acts of technical service 
as benevolence or charity, the goal of co-liberation requires that those technical work¬ 
ers acknowledge that they are engaged in a struggle for their own liberation as well, 
even and especially when they are members of dominant groups. 

For these reasons, data projects designed with co-liberation in mind must be very 
specific about the power dynamics involved. They must take preemptive steps to coun¬ 
teract the privilege hazard that comes with work undertaken by members of dominant 
groups. For example, one such step is to deliberately concentrate strategic leadership, 
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financial resources, and data ownership in the hands of collaborators from minoritized 
groups. 46 It also recognizes that differential power has a silencing effect and that for a 
variety of reasons—as discussed in chapter 4—quantitative data can leave people out. 
To address these almost certain gaps, a model of data for co-liberation would delib¬ 
erately pair quantitative analyses with inclusive civic processes, resulting in locally 
informed, ground-truthed insights that derive from many perspectives. 

The scope of data for co-liberation is also broader. It explicitly includes two addi¬ 
tional outcomes that data for good typically does not: (1) knowledge transfer and (2) 
building social infrastructure. The first involves a two-way exchange: in one direction, 
technical capacity building within the community so that any data products can be 
maintained and/or enhanced without requiring external expertise, and in the other, 
enhanced understanding of and respect for local knowledge for external collabora¬ 
tors, as well as chipping away at their own individual and institutional privilege haz¬ 
ards. Building social infrastructure involves an explicit focus on cultivating community 
solidarity through the project. This entails allocating financial and human resources 
to the community aspects of the project and not only to technical aspects like proc¬ 
essing data or building apps. In the co-liberation model, data science projects become 
community science projects. They take place simultaneously in the database and in 
public space. 

Data for Co-liberation in Action 

What does data for co-liberation look like in action? Are there examples of feminist 
data science that value quantitative methods and pluralistic processes, data education 
and community solidarity? 

Since 2012, Rahul and Emily Bhargava have partnered with community organi¬ 
zations from Belo Horizonte to Boston to create data murals in public spaces (figure 
5.6). These are large-scale infographics that are both designed by and tell stories about 
the people who live and work in those spaces. In all cases, the people themselves 
sought out the collaboration, having recognized a need within their own community 
space. For example, in 2013, Groundwork Somerville, an urban agriculture nonprofit, 
approached the Bhargavas because it was in the process of establishing its first urban 
farm. As Emily recalls, "The site was disorderly—it was behind a used car parts build¬ 
ing and hidden between other semi-industrial lots. They had built raised beds and 
planted for one growing season but passersby were stealing the vegetables." 47 The 
organization was also running a high school employment program called the Green 
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Figure 5.6 

The process of making a data mural involves conversation, building prototypes with craft materi¬ 
als, workshops in data analysis, and actual painting. Courtesy of Data Therapy, Emily and Rahul 
Bhargava, 2018. 

Team but struggling to fully involve the young people in its mission to create healthier 
communities. 

Linking those two challenges together, Groundwork Somerville and the Bhargavas 
decided to enlist the young people in creating a mural that illustrated the purpose of 
the farm to the community (figure 5.7). First, they brought together data from several 
sources: demographic data from the city, geographic information system (GIS) data on 
unused lots, and internal data from Groundwork Somerville that included growing 
records, food donations, and attendance logs at community events. Then they worked 
with the learners over several after-school sessions to review and discuss the data and 
engage in storyfinding (aka data analysis). By the end of these sessions, the youth had 
sketched the overall outline and iconography of the resulting mural. Read left to right, 
the mural frames the problem: A man grasps for a basket of veggies, but it says "healthy 
food is hard to get." The claim is backed up by data that show the cost of healthy food 
and the number of people with prediabetes. Additional data document the number of 
unused lots in the city and the amount of land that has been reclaimed for urban farm¬ 
ing, pointing to future opportunity. In the next section, the mural depicts a Ground¬ 
work Somerville truck bringing affordable produce to the neighborhood and includes 
the fact that it employs over four hundred youth residents. The data mural ends with a 
vision for a unified and healthy community, "Together Livin' Better." 

On July 30, 2013, the mayor of Somerville and other community leaders attended 
the ribbon-cutting to officially launch the renovated garden. Emily describes the visit: 
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Figure 5.7 

The Groundwork Somerville data mural, painted by youth, staff and volunteers at Groundwork 
Somerville, and the Bhargavas in 2013. Courtesy of Data Therapy, Emily and Rahul Bhargava. 

"The youth, having just spent weeks looking at the data, painting the mural together, 
and building relationships with staff and volunteers, were able to talk about the story 
in great detail to their elected officials." 48 

The partnership represents some of the best aspects of data projects designed for 
co-liberation. But murals are just one possible output from this type of pluralistic, 
community-centered process. 49 For example, Digital Democracy works with Indige¬ 
nous groups around the world to defend their rights through collecting data and mak¬ 
ing maps. 50 In the process, they have developed SMS services with domestic violence 
groups in Haiti and supported the Wapichana people in Guyana to make a data-driven 
case for land rights to the government. In another example, the Westside Atlanta Land 
Trust (WALT) has worked toward affordable housing in Atlanta, Georgia, through par¬ 
ticipatory data collection. 51 Disappointed in what it termed "triple-incorrect" county- 
level data, the organization set out to collect its own spatial dataset about property 
abandonment and disinvestment and has used it to push municipal policymakers 
for change. 52 

Each of these projects—the data mural, the land rights claim, and the abandoned 
property map—could be said to exemplify the data for co-liberation model. Each origi¬ 
nates from a need articulated by the community, rather than projected onto it by more 
powerful institutions. Each relies upon methods of data collection and processing. And 
each also incorporates an explicit process of knowledge transfer from external collabo¬ 
rators (consultants, academics, nonprofit specialists) to the community itself. More 
importantly, none of those external collaborators envision themselves as unicorns, jan¬ 
itors, ninjas, wizards, or rock stars. "At Digital Democracy, we try to fight the superhero 
narrative," says director Emily Jacobi. "We are sidekicks rather than superheroes.” 53 
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Along these lines, each of these projects is explicitly designed to build community 
solidarity and shared understanding around a civic issue. As Amanda Meng and Carl 
DiSalvo explain with respect to the WALT project, "Data collection became a way of 
generating not simply awareness of the conditions, but a sensitivity to the conditions 
and to [community members'] own experience of and situation of being blinded to 
these conditions." 54 Data collection and participatory analysis become a gathering 
technology—a kind of campfire—where information is exchanged, social cohesion is 
enhanced, and future actions are co-conspired. 55 The campfire model also has the effect 
of challenging the deficit narratives that so often surround underserved communities. 56 
Instead, as social movement scholar Maribel Casas-Cortes has asserted, cultivating 
community solidarity contributes to a sense of collective agency and transformative 
possibility, what she describes as "an innovative sense of political participation and 
re-invigorating political imaginaries." 57 

Does the Campfire Scale? 

Data for co-liberation is simultaneously more specific than data for good and larger in 
scope. By advocating for community-based leadership and resources, qualitative inves¬ 
tigations to complement quantitative analyses, and intentional project design in areas 
that exceed the purview of data for good projects, such as knowledge transfer and 
building community solidarity, data for co-liberation enables the participation of many 
actors and agents. But can this pluralistic model scale up? What would "big data for 
co-liberation” look like? Would it be possible at all? 

Questions about big data versus little data or quantitative data versus qualitative 
data are far too often framed as false binaries. And as with all false binaries, there 
are paths through those distinctions that combine and multiply options, rather than 
reduce or divide the world into fake choices. The key question to keep in mind is how 
we can scale up data for co-liberation in ways that remain careful, community-based, 
and complex. 58 

One real-world example of this intentional practice is the Global Atlas of Environ¬ 
mental Justice (EJ Atlas), a large-scale research project and open archive (figure 5.8) 
directed by scholars Leah Temper and Joan Martinez-Alier. 59 Initiated in 2012 by a 
team of researchers at the Universitat Autonoma de Barcelona, in Spain, the EJ Atlas 
represents a systematic collection of global ecological conflicts. It was created in part 
as a response to assertions that environmental justice scholars were focusing too much 
attention on local case studies at the expense of making global connections. For the 


146 


Chapter 5 



Figure 5.8 

The Global Atlas of Environmental Justice (EJ Atlas; https://ejatlas.org/) works in partnership with 
activists, civil society organizations, and social movements to systematically document ecological 
conflicts around the globe. The scope and scale of the EJ Atlas enables activists to connect with 
others and facilitates researchers studying conflicts in a quantitative and comparative context, 
without sacrificing a commitment to a pluralistic process and the dignity of local and community 
knowledges. Courtesy of the Global Atlas of Environmental Justice, 2019. 


most part, the global ecological conflicts that the EJ Atlas collects are cases straight out 
of the matrix of domination, of wealthy people overutilizing natural resources and 
displacing environmental risk and degradation to poorer people, who are often also 
minoritized because of their gender, indigeneity, race, and/or geography. For example, 
one of the cases in the EJ Atlas discusses the efforts of Berta Caceres and the Lenca 
Indigenous people to oppose the construction of a dam and hydropower plant in Hon¬ 
duras. 60 Caceres won international awards for her community organizing, only to be 
tragically assassinated by employees of the company building the dam. (The assassins 
were later convicted in a Honduran court of law. 61 ) In the EJ Atlas, the entry on Cace- 
res's organizing efforts includes copious links, photos, and geographic data about the 
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dam, as well as information about where the incident falls in the atlas's typology of 
conflict. 

The EJ Atlas demonstrates that scale is not incompatible with valuing local knowl¬ 
edges and relationships with local actors. Temper and her colleagues were able to draw 
on past relationships with partner organizations to assemble their archive, demonstrat¬ 
ing how time invested in building relationships at the outset of a project, or of a career, 
continues to accrue benefits for all parties involved. 62 At the time of writing, the EJ Atlas 
includes close to three thousand cases of ecological conflicts from around the world. 
Collaborators have created submaps from the atlas for countries including Colombia, 
Italy, and Turkey. Activists have used the atlas to draw media and policymaker atten¬ 
tion to overlooked environmental conflicts, such as the construction of a ski resort in 
the middle of a nature park in Kazakhstan, which has disturbed hydrological systems. 63 
The atlas has also enabled comparative empirical research, like that undertaken by 
economist Begum Ozkaynak and several colleagues. 64 Their work employs social net¬ 
work analysis to study the relationships among mining corporations, their financiers, 
and environmental groups challenging that practice to understand the geography of 
the actors and their connections to each other. 65 

Scale is emphatically not incompatible with the feminist imperative to value mul¬ 
tiple and local knowledges. Research like Ozkaynak's and that of her colleagues would 
not be possible without a large-scale archive like the EJ Atlas. A pluralistic, participa¬ 
tory, iterative process like the EJ Atlas will take longer to scale than the extractive, 
quantity-at-all-costs approach of conventional big data. But ultimately the data—and 
the relationships, and the community capacity—will be of higher quality. 

Embrace Pluralism 

The fifth principle of data feminism is to embrace pluralism in the whole process of 
working with data, from collection to analysis to communication to decision-making. 
As we describe in this chapter, data work carries a high risk of enacting what Gayatri 
Spivak has termed epistemic violence, particularly when the people doing the work are 
strangers in the dataset, when they are one or more steps removed from the local con¬ 
text of the data, and when they view themselves (or are viewed by society) as unicorns, 
rock stars, and wizards. 

Embracing pluralism is a feminist strategy for mitigating this risk. It allows both 
time and space for a range of participants to contribute their knowledge to a data 
project and to do so at all stages of that project. In contrast to an underspecified data 
for good model, embracing pluralism offers a way to work toward a model of data for 
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co-liberation. This means transferring knowledge from experts to communities and 
explicitly cultivating community solidarity in data work, as we see in the case of the 
Anti-Eviction Mapping Project. Moreover, embracing pluralism is not incompatible 
with "bigness" or scale; the EJ Atlas shows how pluralistic processes can be used to 
assemble a global archive and support empirical work in the service of justice. A single 
data scientist wizard will never defeat the matrix of domination alone, no matter how 
powerful their spells might be. But a well-designed, data-driven, participatory process, 
one that centers the standpoints of those most marginalized, empowers project partici¬ 
pants, and builds new relationships across lines of social difference—well, that might 
just have a chance. 


6 The Numbers Don't Speak for Themselves 


Principle: Consider Context 

Data feminism asserts that data are not neutral or objective. They are the products of unequal 
social relations, and this context is essential for conducting accurate, ethical analysis. 


In April 2014, 276 young women were kidnapped from their high school in the town 
of Chibok in northern Nigeria. Boko Haram, a militant terrorist group, claimed respon¬ 
sibility for the attacks. The press coverage, both in Nigeria and around the world, was 
fast and furious. SaharaReporters.com challenged the government's ability to keep its 
students safe. CNN covered parents' anguish. The Japan Times connected the kidnap¬ 
pings to the increasing unrest in Nigeria's northern states. And the BBC told the story 
of a girl who had managed to evade the kidnappers. Several weeks after this initial 
reporting, the popular blog FiveThirtyEight published its own data-driven story about 
the event, titled "Kidnapping of Girls in Nigeria Is Part of a Worsening Problem." 1 
The story reported skyrocketing rates of kidnappings. It asserted that in 2013 alone 
there had been more than 3,608 kidnappings of young women. Charts and maps 
accompanied the story to visually make the case that abduction was at an all-time high 
(figure 6.1). 

Shortly thereafter, the news website had to issue an apologetic retraction because 
its numbers were just plain wrong. The outlet had used the Global Database of Events, 
Language and Tone (GDELT) as its data source. GDELT is a big data project led by com¬ 
putational social scientist Kalev Leetaru. It collects news reports about events around 
the world and parses the news reports for actors, events, and geography with the aim of 
providing a comprehensive set of data for researchers, governments, and civil society. 
GDELT tries to focus on conflict—for example, whether conflict is likely between two 
countries or whether unrest is sparking a civil war—by analyzing media reports. How¬ 
ever, as political scientist Erin Simpson pointed out to FiveThirtyEight in a widely cited 
Twitter thread, GDELT's primary data source is media reports (figure 6.2). 2 The project is 
not at a stage at which its data can be used to make reliable claims about independent 
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Abduction Trends 

Daily kidnappings in Nigeria, 1982-2014 
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Figure 6.1 

In 2014, FiveThirtyEight erroneously charted counts of "daily kidnappings" in Nigeria. The news 
site failed to recognize that the data source it was using was not counting events, but rather media 
reports about events. Or some events and some media reports. Or it was counting something, but 
we are still not sure what. Image by FiveThirtyEight. 


cases of kidnapping. The kidnapping of schoolgirls in Nigeria was a single event. There 
were thousands of global media stories about it. Although GDELT de-duplicated some 
of those stories to a single event, it still logged, erroneously, that hundreds of kidnap¬ 
ping events had happened that day. The FiveThirtyEight report had counted each of 
those GDELT pseudoevents as a separate kidnapping incident. 

The error was embarrassing for FiveThirtyEight, not to mention for the reporter, but it 
also helps to illustrate some of the larger problems related to data found "in the wild." 
First, the hype around "big data" leads to projects like GDELT wildly overstating the 
completeness and accuracy of its data and algorithms. On the website and in publica¬ 
tions, the project leads have stated that GDELT is "an initiative to construct a catalog 
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Figure 6.2 

Two tweets by Erin Simpson in response to FiveThirtyEight’s erroneous interpretation of the GDELT 
dataset. Tweets by Erin Simpson on May 13, 2014. 

of human societal-scale behavior and beliefs across all countries of the world, connect¬ 
ing every person, organization, location, count, theme, news source, and event across 
the planet into a single massive network that captures what's happening around the 
world, what its context is and who's involved, and how the world is feeling about it, 
every single day." 3 That giant mouthful describes no small or impotent big data tool. It 
is clearly Big Dick Data. 

Big Dick Data is a formal, academic term that we, the authors, have coined to denote 
big data projects that are characterized by masculinist, totalizing fantasies of world 
domination as enacted through data capture and analysis. Big Dick Data projects 
ignore context, fetishize size, and inflate their technical and scientific capabilities. 4 
In GDELT's case, the question is whether we should take its claims of big data at face 
value or whether the Big Dick Data is trying to trick funding organizations into giving 
the project massive amounts of research funding. (We have seen this trick work many 
times before.) 
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The GDELT technical documentation does not provide any more clarity as to 
whether it is counting media reports (as Simpson asserts) or single events. The data¬ 
base FiveThirtyEight used is called the GDELT Event Database, which certainly makes 
it sound like it's counting events. The GDELT documentation states that "if an event 
has been seen before it will not be included again," which also makes it sound like it's 
counting events. And a 2013 research paper related to the project confirms that GDELT 
is indeed counting events, but only events that are unique to specific publications. So 
it's counting events, but with an asterisk. Compounding the matter, the documenta¬ 
tion offers no guidance as to what kinds of research questions are appropriate to ask the 
database or what the limitations might be. People like Simpson who are familiar with 
the area of research known as event detection, or members of the GDELT community, 
may know to not believe (1) the title of the database, (2) the documentation, and (3) 
the marketing hype. But how would outsiders, let alone newcomers to the platform, 
ever know that? 

We've singled out GDELT, but the truth is that it's not very different from any num¬ 
ber of other data repositories out there on the web. There are a proliferating number 
of portals, observatories, and websites that make it possible to download all manner of 
government, corporate, and scientific data. There are APIs that make it possible to write 
little programs to query massive datasets (like, for instance, all of Twitter) and down¬ 
load them in a structured way. 5 There are test datasets for network analysis, machine 
learning, social media, and image recognition. There are fun datasets, curious datasets, 
and newsletters that inform readers of datasets to explore for journalism or analysis. 6 
In our current moment, we tend to think of this unfettered access to information as an 
inherent good. And in many ways, it is kind of amazing that one can just google and 
download data on, for instance, pigeon racing, the length of guinea pig teeth, or every 
single person accused of witchcraft in Scotland between 1562 and 1736—not to men¬ 
tion truckloads and truckloads of tweets. 7 

And though the schooling on data verification received by FiveThirty Eight was rightly 
deserved, there is a much larger issue that remains unaddressed: the issue of context. 
As we've discussed throughout this book, one of the central tenets of feminist thinking 
is that all knowledge is situated. A less academic way to put this is that context matters. 
When approaching any new source of knowledge, whether it be a dataset or dinner 
menu (or a dataset of dinner menus), it's essential to ask questions about the social, 
cultural, historical, institutional, and material conditions under which that knowledge 
was produced, as well as about the identities of the people who created it. 8 Rather than 
seeing knowledge artifacts, like datasets, as raw input that can be simply fed into a 
statistical analysis or data visualization, a feminist approach insists on connecting data 
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back to the context in which they were produced. This context allows us, as data sci¬ 
entists, to better understand any functional limitations of the data and any associated 
ethical obligations, as well as how the power and privilege that contributed to their 
making may be obscuring the truth. 

Situating Data on the Wild Wild Web 

The major issue with much of the data that can be downloaded from web portals or 
through APIs is that they come without context or metadata. If you are lucky you 
might get a paragraph about where the data are from or a data dictionary that describes 
what each column in a particular spreadsheet means. But more often than not, you get 
something that looks like figure 6.3. 

The data shown in the figure—open budget data about government procurement in 
Sao Paulo, Brazil—do not look very technically complicated. The complicated part is 
figuring out how the business process behind them works. How does the government 
run the bidding process? How does it decide who gets awarded a contract? Are all the 
bids published here, or just the ones that were awarded contracts? What do terms like 
competition, cooperation agreement, and terms of collaboration mean to the data publisher? 
Why is there such variation in the publication numbering scheme? These are only a 
few of the questions one might ask when first encountering this dataset. But with¬ 
out answers to even some of these questions—to say nothing of the local knowledge 
required to understand how power is operating in this particular ecosystem—it would 
be difficult to even begin a data exploration or analysis project. 

This scenario is not uncommon. Most data arrive on our computational door¬ 
step context-free. And this lack of context becomes even more of a liability when 
accompanied by the kind of marketing hype we see in GDELT and other Big Dick 
Data projects. In fact, the 1980s version of these claims is what led Donna Haraway 
to propose the concept of situated knowledge in the first place. 9 Subsequent feminist 
work has drawn on the concept of situated knowledge to elaborate ideas about ethics 
and responsibility in relation to knowledge-making. 10 Along this line of thinking, it 
becomes the responsibility of the person evaluating that knowledge, or building upon 
it, to ensure that its "situatedness" is taken into account. For example, information 
studies scholar Christine Borgman advocates for understanding data in relation to 
the "knowledge infrastructure" from which they originate. As Borgman defines it, a 
knowledge infrastructure is "an ecology of people, practices, technologies, institutions, 
material objects, and relationships." 11 In short, it is the context that makes the data 
possible. 


Nr. PublicagSo 

Lldtador 

Modalidade 

Dt. Abertura 

Objeto 

01- 

PREF/SECOM/2018 

Secretaria do 
Governo Municipal 
- SGM 

CONCORRENCIA 

10/06/2019 

14:00 

Contratato de empresa para prestato de servigos de 
assessoria de imprensa e comunicagao para a PREF/SECOM 

03/SGM-2019 

SGM - 

Administrate de 
Compras e 
Contratos 

CONCORRENCIA 

03/06/2019 

10:30 

AUENAgAO DO IMOVEL MUNICIPAL SITUADO NA AVENIDA 
PROFESSOR ALCEU MAYNARD ARAUJO, NO DISTRITO DE 
SANTO AMARO. 

01/SMPED/2019 

Secretaria 

Municipal da 

Pessoa com 
Defidfinda - 
SMPED 

TOMADA DE 
PREgOS 

31/05/2019 

10:30 

Contratato de empresa especializada em produgao e 
atuallzagio de material dlditico orientador e informatlvo com 
produgSo de conteudo em versao digital acessfvel, visando a 
subsidiar a capacitate do publico aivo dos cursos e eventos 
oferecldos pela Secretaria Municipal da Pessoa com 

DeficiSncia - SMPED. 

19/SME/2019 

Secretaria 

Municipal de 
Educate* - SME 

pregSo 

eletr6nico 

30/05/2019 

10:30 

Registro de pregos para aquisito de alimentos nao 
pereclveis agucar refinado. 

007/2019 

Sao Paulo 
Transporte S/A 

pregAo 

eletr6nico 

27/05/2019 

10:00 

OBJETO: AQUISigAO DE 6 (SEIS) EQUIPAMENTOS 

APPLIANCE DO TIPO UTM?s, COM LICENgAS DE 

SEGURANgA, INSTALAgAo E SUPORTE TECNICO, PELO 
PERiODO DE 24 (VINTE E QUATRO) MESES 

109/SMADS/2019 

Secretaria 

Municipal de 
Assistfincia e 
Desenvolvlmento 
Social - SMADS 

TERMO DE 
COLABORAQto 
- EDITAL 

24/05/2019 

10:00 

C J 

108/SMADS/2019 

Secretaria 

Municipal de 
Assistfincia e 
Desenvolvlmento 
Social - SMADS 

TERMO DE 
colaboracAo 
- EDITAL 

24/05/2019 

10:00 

Centro de Acolhida com InsergSo Produtiva para Adultos em 
SituagSo de Rua 

001/2018/SEHAB 

Secretaria 

Municipal de 

Habitato - 

SEHAB - 
GABINETE 

CONCORRENCIA 

24/05/2019 

10:00 

EXECUgAO DE OBRAS DE CONSTRUgAO DE 
EMPREENDIMENTO HABITACIONAL DE INTERESSE SOCIAL E 
DE USO MISTO, DENOMINADO COLISEU, NO AMBITO DA 
OPERAgAO URBANA CONSORCIADA FARIA LIMA 

002/SVMA/2019 

Secretaria 

Municipal do 

Verde e Meio 
Amblente - SVMA 

CONCORRENCIA 

23/05/2019 

10:30 

CONTRATAgAO DE SERVigOS TECNICOS ESPECIALIZADOS 
PARA A ELABORAgAO DO PLANO DE MANEJO DA AREA DE 
PROTEgAO AMBIENTAL (APA) BORORE-COLONIA 

070/18 

S5o Paulo Turismo 
- SPTURIS 

pregAo 

eletr6nico 

22/05/2019 

10:00 

ContratagSo de empresa, sob o regime de empreitada por 
prego unitrio, para prestagao de servigos de BOMBEIRO 
PROFISSIONAL CIVIL, por urn periodo de 12 (doze) meses, 
prorrogSveis por iguais ou menores perlodos, conforme 
bases, espedficagdes e condigoes do Edltal e seus Anexos. 

093/2019-SMS.G 

Secretaria 

Municipal de 

Saude - SMS 

pregAo 

eletr6nico 

22/05/2019 

09:00 

Registro de pregos para o fomedmento de PAPEL CREPADO 

E SWAB, ALCOOL 70% PARA ANTI-SEPSIA. 

121/2019-SMS.G 

Secretaria 

Municipal de 

Saude - SMS 

pregAo 

eletr6nico 

21/05/2019 

10:30 

Registro de pregos para o fomedmento de KIT PARA 
IDENTIFICAgAO QUALITATIVA PARA O COMPLEXO M. 
TUBERCULOSIS. 

18/SME/2019 

Secretaria 

Municipal de 
Educate - SME 

pregAo 

eletr6nico 

21/05/2019 

10:30 

Registro de prego para aqulsito de Item A: Sardinha em 

6leo comestfvel e Item B: Atum em pedagos em conserve. 

166/2019 

Autarquia 

Hospltalar 

Municipal - AHM 

pregAo 

eletr6nico 

21/05/2019 

09:30 

AQUISigAO DE SULFAMETOXAZOL 80 MG/ML + 
TRIMETOPRIMA 16 MG/ML 5 ML, PARA AS UNIDADES DA 
AUTARQUIA HOSPITALAR MUNICIPAL. 

119/2019-SMS.G 

Secretaria 

Municipal de 

Saude • SMS 

pregAo 

eletr6nico 

20/05/2019 

10:30 

Registro de pregos para o fomedmento de ETIQUETA 

TERMICA CONTINUA, AUTOADESIVA, PARA IMPRESSAO 
TliRMICA ? 62MM X 15M. 

117/2019-SMS.G 

Secretaria 

Municipal de 

Saude - SMS 

pregAo 

eletr6nico 

20/05/2019 

09:30 

Aquisito de MATERIAL ODONTOL6GICO - F6RCEPS PARA 
USO ODONTOL6GICO. 

047/2019-HMEC 

Hospital Municipal 
Maternidade- 
Escola Dr. Mario 
de Moraes 
Altenfelder Silva 

pregAo 

eletr6nico 

20/05/2019 

09:00 

BERACTANTO SUSPENSAO INTRA-TRAQUEAL 25 MG/ML FAM 
8,0 ML ? FAM 

103/2019-SMS.G 

Secretaria 

Municipal de 

Saude - SMS 

pregAo 

eletrGnico 

17/05/2019 

10:30 

Registro de pregos para o fomedmento de MATERIAL DE 
LABORAT6RIO - COLETOR UNIVERSAL ESTORIL, PIPETA DE 
TRANSFERENCE E SWAB DE RAYON. 

055/2019-HMEC 

Hospital Municipal 
Maternidade- 
Escola Dr. Mario 
de Moraes 
Altenfelder Silva 

pregAo 

eletrCnico 

17/05/2019 

10:00 

PLACA DESCARTAVEL PARA ELETROCIRURGIA 

002/2019 

S5o Paulo Obras - 
SP Obras 

TOMADA DE 
PREgOS 

17/05/2019 

09:30 

ContratagSo de empresa especializada em engenharla e 
arqultetura para execute das obras de reforma para 
implantagSo do DESCOMPLICA SP ? UNIDADE SAO MATEUS . 


Figure 6.3 

Open budget data about procurement and expenses from the Sao Paulo prefecture in Brazil. 
Although Brazil has some of the most progressive transparency laws on the books, the data that 
are published aren't necessarily always accessible or usable by citizens and residents. In 2013, 
researcher Gisele Craveiro worked with civil society organizations to give this open budget data 
more context. Images from SIGRC for the Prefecture of Sao Paulo, Brazil. 
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Ironically, some of the most admirable aims and actions of the open data move¬ 
ment have worked against the ethical urgency of providing context, however inad¬ 
vertently. Open data describes the idea that anyone can freely access, use, modify, and 
share data for any purpose. The open data movement is a loose network of organiza¬ 
tions, governments, and individuals. It has been active in some form since the mid- 
2000s, when groups like the Open Knowledge Institute were founded and campaigns 
like Free Our Data from the Guardian originated to petition governments for free access 
to public records. 12 The goals are good ones in theory: economic development by 
building apps and services on open data; faster scientific progress when researchers 
share knowledge; and greater transparency for journalists, citizens, and residents to be 
able to use public information to hold governments accountable. This final goal was 
a major part of the framing of former US president Obama's well-known memoran¬ 
dum on transparency and open government. 13 On his very first day in office, Obama 
signed a memorandum that directed government agencies to make all data open by 
default. 14 Many more countries, states, and cities have followed suit by developing 
open data portals and writing open data into policy. As of 2019, seventeen coun¬ 
tries and over fifty cities and states have adopted the International Open Data Char¬ 
ter, which outlines a set of six principles guiding the publication and accessibility of 
government data. 15 

In practice, however, limited public funding for technological infrastructure 
has meant that governments have prioritized the "opening up" part of open data— 
publishing spreadsheets of things like license applications, arrest records, and flood 
zones—but lack the capacity to provide any context about the data's provenance, let 
alone documentation that would allow the data to be made accessible and usable by the 
general public. As scholar Tim Davies notes, raw data dumps might be good for starting 
a conversation, but they cannot ensure engagement or accountability. 16 The reality is 
that many published datasets sit idle on their portals, awaiting users to undertake the 
intensive work of deciphering the bureaucratic arcana that obscures their significance. 
This phenomenon has been called zombie data: datasets that have been published with¬ 
out any purpose or clear use case in mind. 17 

Zombies might be bad for brains, but is zombie data really a problem? Wired maga¬ 
zine editor Chris Anderson would say, emphatically, "No." In a 2008 Wired article, "The 
End of Theory," Anderson made the now-infamous claim that "the numbers speak for 
themselves." 18 His main assertion was that the advent of big data would soon allow 
data scientists to conduct analyses at the scale of the entire human population, without 
needing to restrict their analysis to a smaller sample. To understand his claim, you need 
to understand one of the basic premises of statistics. 
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Statistical inference is based on the idea of sampling: that you can infer things 
about a population (or other large-scale phenomenon) by studying a random and/ 
or representative sample and then mapping those findings back on the population 
(or phenomenon) as a whole. Say that you want to know who all of the 323 million 
people in the US will vote for in the coming presidential election. You couldn't con¬ 
tact all of them, of course, but you could call three thousand of them on the phone 
and then use those results to predict how the rest of the people would likely vote. 
There would also need to be some statistical modeling and theory involved, because 
how do you know that those three thousand people are an accurate representation of 
the whole population? This is where Anderson made his intervention: at the point at 
which we have data collected on the entire population, we no longer need modeling, 
or any other "theory" to first test and then prove. We can look directly at the data 
themselves. 

Now, you can't write an article claiming that the basic structure of scientific inquiry 
is obsolete and not expect some pushback. Anderson wrote the piece to be provoca¬ 
tive, and sure enough, it prompted numerous responses and debates, including those 
that challenge the idea that this argument is a "new" way of thinking in the first 
place (e.g., in the early seventeenth century, Francis Bacon argued for a form of induc¬ 
tive reasoning, in which the scientist gathers data, analyzes them, and only thereafter 
forms a hypothesis). 19 One of Anderson's major examples is Google Search. Google's 
search algorithms don't need to have a hypothesis about why some websites have more 
incoming links—other pages that link to the site—than others; they just need a way to 
determine the number of links so they can use that number to determine the popular¬ 
ity and relevance of the site in search results. We no longer need causation, Anderson 
insists: "Correlation is enough." 20 But what happens when the number of links is also 
highly correlated with sexist, racist, and pornographic results? 

The influence of racism, sexism, and colonialism is precisely what we see described 
in Algorithms of Oppression, information studies scholar Safiya Umoja Noble's study of 
the harmful stereotypes about Black and Latinx women perpetuated by search algo¬ 
rithms such as Google's. As discussed in chapter 1, Noble demonstrates that Google 
Search results do not simply correlate with our racist, sexist, and colonialist society; 
that society causes the racist and sexist results. More than that, Google Search reinforces 
these oppressive views by ranking results according to how many other sites link to 
them. The rank order, in turn, encourages users to continue to click on those same 
sites. Here, correlation without context is clearly not enough because it recirculates rac¬ 
ism and sexism and perpetuates inequality. 21 
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There's another reason that context is necessary for making sense of correlation, 
and it has to do with how racism, sexism, and other forces of oppression enter into 
the environments in which data are collected. The next example has to do with sexual 
assault and violence. If you do not want to read about these topics, you may want to 
skip ahead to the next section. 

In April 1986, Jeanne Clery, a student at Lehigh University, was sexually assaulted and 
murdered in her dorm room. Her parents later found out that there had been thirty- 
eight violent crimes at Lehigh in the prior three years, but nobody had viewed that as 
important data that should be made available to parents or to the public. The Clerys 
mounted a campaign to improve data collection and communication efforts related 
to crimes on college campuses, and it was successful: the Jeanne Clery Act was passed 
in 1990, requiring all US colleges and universities to make on-campus crime statistics 
available to the public. 22 

So we have an ostensibly comprehensive national dataset about an important public 
topic. In 2016, three students in Catherine’s data journalism class at Emerson College— 
Patrick Torphy, Michaela Halnon, and Jillian Meehan—downloaded the Clery Act data 
and began to explore it, hoping to better understand the rape culture that has become 
pervasive on college campuses across the United States. 23 They soon became puzzled, 
however. Williams College, a small, wealthy liberal arts college in rural Massachusetts, 
seemed to have an epidemic of sexual assault, whereas Boston University (BU), a large 
research institution in the center of the city, seemed to have strikingly few cases rela¬ 
tive to its size and population (not to mention that several high-profile sexual assault 
cases at BU had made the news in recent years). 24 The students were suspicious of these 
numbers, and investigated further. After comparing the Clery Act data with anony¬ 
mous campus climate surveys (figure 6.4), consulting with experts, and interviewing 
survivors, they discovered, paradoxically, that the truth was closer to the reverse of the 
picture that the Clery Act data suggest. Many of the colleges with higher reported rates 
of sexual assault were actually places where more institutional resources were being 
devoted to support for survivors. 25 

As for the colleges with lower numbers, this is also explained by context. The Clery 
Act requires colleges and universities to provide annual reports of sexual assault and 
other campus crimes, and there are stiff financial penalties for not reporting. But the 
numbers are self-reported, and there are also strong financial incentives for colleges not 
to report. 26 No college wants to tell the government—let alone parents of prospective 
students—that it has a high rate of sexual assault on campus. This is compounded by 
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CLery report data and anonymous survey results leave 
vastly different impressions of rape culture on college 
campuses. 
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Boston University surveyed its students in 2015, with a response rate of 22 percent. Nearly one in five respondents 
reported experiencing some type of sexual harassment or assault during their time at Boston University, compared 
to one in 2500 who reported assault in 2014. 
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Emerson College surveyed its students in 2015, with a 52 percent response rate. About one in 10 respondents said 
they experienced nonconsensual sexual contact on-campus during their time at Emerson, compared to one in 666 
students that reported forcible sex offenses in 2014. 

Figure 6.4 

Data journalism students at Emerson College were skeptical of the self-reported Clery Act data 
and decided to compare the Clery Act results with anonymous campus climate survey results 
about nonconsensual sexual contact. Although there are data-quality Issues with both data¬ 
sets, the students assert that if institutions are providing adequate support for survivors, then 
there will be less of a gap between the Clery-reported data and the proportion of students that 
report nonconsensual sexual conduct. Courtesy of Patrick Torphy, Michaela Halnon, and Jillian 
Meehan, 2016. 
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the fact that survivors of sexual assault often do not want to come forward—because of 
social stigma, the trauma of reliving their experience, or the resulting lack of social and 
psychological support. Mainstream culture has taught survivors that their experiences 
will not be treated with care and that they may in fact face more harm, blame, and 
trauma if they do come forward. 27 

There are further power differentials reflected in the data when race and sexuality 
are taken into account. For example, in 2014, twenty-three students filed a complaint 
against Columbia University, alleging that Columbia was systematically mishandling 
cases of rape and sexual violence reported by LGBTQ students. Zoe Ridolfi-Starr, the 
lead student named in the complaint, told the Daily Beast, "We see complete lack of 
knowledge about the specific dynamics of sexual violence in the queer community, 
even from people who really should be trained in those issues." 28 

Simply stated, there are imbalances of power in the data setting —to use the phrase 
coined by Yanni Loukissas that we discussed in chapter 5—so we cannot take the num¬ 
bers in the dataset at face value. Lacking this understanding of power in the collection 
environment and letting the numbers "speak for themselves" would tell a story that 
is not only patently false but could also be used to reward colleges that are system¬ 
atically underreporting and creating hostile environments for survivors. Deliberately 
undercounting cases of sexual assault leads to being rewarded for underreporting. And 
the silence around sexual assault continues: the administration is silent, the campus 
culture is silent, the dataset is silent. 29 

Raw Data, Cooked Data, Cooking 

As demonstrated by the Emerson College students, one of the key analytical missteps 
of work that lets "the numbers speak for themselves" is the premise that data are a raw 
input. But as Lisa Gitelman and Virginia Jackson have memorably explained, data enter 
into research projects already fully cooked—the result of a complex set of social, politi¬ 
cal, and historical circumstances. '"Raw data' is an oxymoron," they assert, just like 
"jumbo shrimp.'' 30 But there is an emerging class of "data creatives" whose very exis¬ 
tence is premised on their ability to context-hop —that is, their ability to creatively mine 
and combine data to produce new insights, as well as work across diverse domains. This 
group includes data scientists, data journalists, data artists and designers, researchers, 
and entrepreneurs—in short, pretty much everyone who works with data right now. 
They are the strangers in the dataset that we spoke of in chapter 5. 

Data's new creative class is highly rewarded for producing work that creates new 
value and insight from mining and combining conceptually unrelated datasets. 
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Examples include Google's now defunct Flu Trends project, which tried to geographi¬ 
cally link people's web searches for flu symptoms to actual incidences of flu. 31 Or a proj¬ 
ect of the Sun Sentinel newspaper, in Fort Lauderdale, Florida, which combined police 
license plate data with electronic toll records to prove that cops were systematically and 
dangerously speeding on Florida highways. 32 Sometimes these acts of creative synthe¬ 
sis work out well; the Sun Sentinel won a Pulitzer for its reporting and a number of the 
speeding cops were fired. But sometimes the results are not quite as straightforward. 
Google Flu Trends worked well until it didn't, and subsequent research has shown that 
Google searches cannot be used as 1:1 signals for actual flu phenomena because they 
are susceptible to external factors, such as what the media is reporting about the flu. 33 

Instead of taking data at face value and looking toward future insights, data scien¬ 
tists can first interrogate the context, limitations, and validity of the data under use. 
In other words, one feminist strategy for considering context is to consider the cook¬ 
ing process that produces "raw” data. As one example, computational social scientists 
Derek Ruths and Jurgen Pfeffer write about the limitations of using social media data 
for behavioral insights: Instagram data skews young because Instagram does; Reddit 
data contains far more comments by men than by women because Reddit's overall 
membership is majority men. They further show how research data acquired from 
those sources are shaped by sampling because companies like Reddit and Instagram 
employ proprietary methods to deliver their data to researchers, and those methods 
are never disclosed. 34 Related research by Devin Gaffney and J. Nathan Mafias took on 
a popular corpus that claimed to contain "every publicly available Reddit comment.'' 35 
Their work showed the that the supposedly complete corpus is missing at least thirty- 
six million comments and twenty-eight million submissions. 

Exploring and analyzing what is missing from a dataset is a powerful way to gain 
insight into the cooking process—of both the data and of the phenomenon it purports 
to represent. In some of Lauren's historical work, she looks at actual cooks as they are 
recorded (or not) in a corpus of thirty thousand letters written by Thomas Jefferson, as 
shown in figure 6.5. 36 Some may already know that Jefferson is considered the nation’s 
"founding foodie." 37 But fewer know that he relied upon an enslaved kitchen staff to 
prepare his famous food. 38 In "The Image of Absence,” Lauren used named-entity rec¬ 
ognition, a natural language processing technique, to identify the places in Jefferson's 
personal correspondence where he named these people and then used social network 
analysis to approximate the extent of the relationships among them. The result is a 
visual representation of all of the work that Jefferson's enslaved staff put into prepar¬ 
ing his meals but that he did not acknowledge—at least not directly—in the text of the 
letters themselves. 


Thomas Jefferson 
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Jefferson Political Virginia Friends and Free Enslaved Miscellaneous 

Family Correspondents Colleagues Plantation Staff Plantation Staff 


Figure 6.5 

In "The Image of Absence" (2013), Lauren used machine learning techniques to identify the 
names of the people whom Thomas Jefferson mentioned in his personal correspondence and 
then visualized the relationships among them. The result demonstrates all of the work that his 
enslaved staff put into preparing Jefferson's meals but that was not directly acknowledged by Jef¬ 
ferson himself. Visualization by Lauren F. Klein. 


On an even larger scale, computer scientists and historians at Stanford University 
used word embeddings—another machine learning technique—to explore gender and 
ethnic stereotypes across the span of the twentieth century. 39 Using several large data¬ 
sets derived from sources such as the Google Books and the New York Times, the team 
showed how words like intelligent, logical, and thoughtful were strongly associated with 
men until the 1960s. Since that time, however, those words have steadily increased 
in association with women. The team attributed this phenomenon to the "women's 
movement in the 1960s and 1970s," making their work an interesting example of an 
attempt to quantify the impact of social movements. The paper is also notable for 
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openly acknowledging how their methods, which involved looking at the adjectives 
surrounding the words man and woman, limited the scope of their analysis to the gen¬ 
der binary. Furthermore, the researchers did not try to assert that the data represent 
how women and men "are," nor did they try to "remove the bias" so that they could 
develop "unbiased" applications in other domains. They saw the data as what they 
are—cultural indicators of the changing face of patriarchy and racism—and interro¬ 
gated them as such. 

So, how do we produce more work like this—work that understands data as already 
"cooked" and then uses that data to expose structural bias? Unfortunately for Chris 
Anderson, the answer is that we need more theory, not less. Without theory, survey 
designers and data analysts must rely on their intuition, supported by "common sense" 
ideas about the things they are measuring and modeling. This reliance on "common 
sense" leads directly down the path to bias. Take the case of GDELT. Decades of research 
has demonstrated that events covered by the media are selected, framed, and shaped 
by what are called "news values": values that confirm existing images and ideologies. 40 
So what is it really that GDELT is measuring? What events are happening in the world, 
or what the major international news organizations are focusing their attention on? 
The latter might be the most powerful story embedded in the GDELT database. But it 
requires deep context and framing to draw it out. 

Refusing to acknowledge context is a power play to avoid power. It's a way to assert 
authoritativeness and mastery without being required to address the complexity of 
what the data actually represent: the political economy of the news in the case of 
GDELT, entrenched gender hierarchies and flawed reporting environments in the case 
of the Clery data, and so on. But deep context and computation are not incompatible. 
For example, SAFElab, a research lab at Columbia run by scholar and social worker 
Desmond Patton, uses artificial intelligence to examine the ways that youth of color 
navigate violence on and offline. He and a team of social work students use Twitter data 
to understand and prevent gang violence in Chicago. Their data are big, and they're 
also complicated in ways that are both technical and social. The team is acutely aware 
of the history of law enforcement agencies using technology to surveil Black people, 
for example, and acknowledges that law enforcement continues to do so using Twitter 
itself. What's more, when Patton started his research, he ran into an even more basic 
problem: "I didn't know what young people were saying, period." 41 This was true even 
though Patton himself is Black, grew up in Chicago, and worked for years in many 
of these same neighborhoods. "It became really clear to me that we needed to take a 
deeper approach to social media data in particular, so that we could really grasp culture, 
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context and nuance, for the primary reason of not misinterpreting what's being said," 
he explains . 42 

Patton's approach to incorporating culture, context, and nuance took the form of 
direct contact with and centering the perspectives of the youth whose behaviors his 
group sought to study. Patton and doctoral student William Frey hired formerly gang- 
involved youth to work on the project as domain experts. These experts coded and cat¬ 
egorized a subset of the millions of tweets, then trained a team of social work students 
to take over the coding. The process was long and not without challenges. It required 
that Patton and Frey create a new "deep listening" method they call the contextual 
analysis of social media to help the student coders mitigate their own bias and get closer 
to the intended meaning of each tweet . 43 The step after that was to train a machine 
learning classifier to automatically label the tweets, so that the project could categorize 
all of the millions of tweets in the dataset. Says Patton, "We trained the algorithm to 
think like a young African American man on the south side of Chicago ." 44 

This approach illustrates how context can be integrated into an artificial intelligence 
project, and can be done with an attention to subjugated knowledge. This term describes 
the forms of knowledge that have been pushed out of mainstream institutions and the 
conversations they encourage. To explain this phenomenon, Patricia Hill Collins gives 
the example of how Black women have historically turned to "music, literature, daily 
conversations, and everyday behavior” as a result of being excluded from "white male- 
controlled social institutions ." 45 These institutions include academia, or—for a recent 
example raised by sociologist Tressie McMillan Cottom—the op-ed section of the New 
York Times . 46 And because they circulate their knowledge in places outside of those 
mainstream institutions, that knowledge is not seen or recognized by those institu¬ 
tions: it becomes subjugated. 

The idea of subjugated knowledge applies to other minoritized groups as well, includ¬ 
ing the Black men from Chicago whom Patton sought to understand. An approach that 
did not attend to this context would have resulted in significant errors. For example, 
a tweet like "aint kill yo mans & ion kno ya homie" would likely have been classified 
as aggressive or violent, reflecting its use of the word "kill." But drawing on the knowl¬ 
edge provided by the young Black men they hired for the project, Frey and Patton were 
able to show that many tweets like this one were references to song lyrics, in this case 
the Chicago rapper Lil Durk. In other words, these tweets are about sharing culture, not 
communicating threats . 47 

In the case of SAFElab, as with all research projects that seek to make use of subju¬ 
gated knowledge, there is also significant human, relational infrastructure required. 
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Frey and Patton have built long-term relationships with individuals and organizations 
in the community they study. Indeed, Frey lives and works in the community. In addi¬ 
tion, both Frey and Patton are trained as social workers. This is reflected in their com¬ 
putational work, which remains guided by the social worker's code of ethics. 48 They 
are using AI to broker new forms of human understanding across power differentials, 
rather than using computation to replace human relationships. This kind of social 
innovation often goes underappreciated in the unicorn-wizard-genius model of data 
science. (For more on unicorns, see chapter 5.) As Patton says, "We had a lot of chal¬ 
lenges with publishing papers in data science communities about this work, because it 
is very clear to me that they're slow to care about context. Not that they don't care, but 
they don't see the innovation or the social justice impact that the work can have." 49 
Hopefully that will change in the future, as the work of SAFElab and others demon¬ 
strates the tremendous potential of combining social work and data science. 

Communicating Context 

It's not just in the stages of data acquisition or data analysis that context matters. Con¬ 
text also comes into play in the framing and communication of results. Let's imagine 
a scenario. In this case, you are a data journalist, and your editor has assigned you to 
create a graphic and short story about a recent research study: "Disparities in Mental 
Health Referral and Diagnosis in the New York City Jail Mental Health Service." 50 This 
study looks at the medical records of more than forty-five thousand first-time incar¬ 
cerated people and finds that some groups are more likely to receive treatment, while 
others are more likely to receive punishment. More specifically, white people are more 
likely to receive a mental health diagnosis, while Black and Latinx people are more 
likely to be placed in solitary confinement. The researchers attribute some of this diver¬ 
gence to the differing diagnosis rates experienced by these groups before becoming 
incarcerated, but they also attribute some of the divergence to discrimination within 
the jail system. Either way, the racial and ethnic disparities are a product of structural 
racism. 

Consider the difference between the two graphics shown in figure 6.6. The only 
variation is the title and framing of the chart. 

Which one of these graphics would you create? Which one should you create? The 
first—Mental Health in Jail—represents the typical way that the results of a data analy¬ 
sis are communicated. The title appears to be neutral and free of bias. This is a graphic 
about rates of mental illness diagnosis of incarcerated people broken down by race and 
ethnicity. The people are referred to as inmates, the language that the study used. The 
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Mental Health in Jail 

Rate of mental health diagnosis of inmates 



White Black Hispanic Other 


Racism in Jail 

People of color less likely to get mental health diagnosis 



White Black Hispanic Other 


Figure 6.6 

Two portrayals of the same data analysis. The data are from a study of people incarcerated for the 
first time in NYC jails between 2011 and 2013. Graphics by Catherine D'lgnazio. Data from Fatos 
Kaba et al., "Disparities in Mental Health Referral and Diagnosis in the New York City Jail Mental 
Health Service." 
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title does not mention race or ethnicity, or racism or health inequities, nor does the 
title point to what the data mean. But this is where additional questions about context 
come in. Are you representing only the four numbers that we see in the chart? Or are 
you representing the context from which they emerged? 

The study that produced these numbers contains convincing evidence that we 
should distrust diagnosis numbers due to racial and ethnic discrimination. The first 
chart does not simply fail to communicate that but also actively undermines that main 
finding of the research. Moreover, the language used to refer to people in jail as inmates 
is dehumanizing, particularly in the context of the epidemic of mass incarceration 
in the United States. 51 So, consider the second chart: Racism in Jail: People of Color 
Less Likely to Get Mental Health Diagnosis. This title offers a frame for how to inter¬ 
pret the numbers along the lines of the study from which they emerged. The research 
study was about racial disparities, so the title and content of this chart are about racial 
disparities. The people behind the numbers are people, not inmates. In addition, and 
crucially, the second chart names the forces of oppression that are at work: racism 
in prison. 

Although naming racism may sound easy and obvious to some readers of this book, 
it is important to acknowledge that fields like journalism still adhere to conventions 
that resist such naming on the grounds that it is "bias” or "opinion." John Daniszewski, 
an editor at the Associated Press, epitomizes this view: "In general our policy is to try 
to be neutral and precise and as accurate as we possibly can be for the given situation. 
We're very cautious about throwing around accusations of our own that characterize 
something as being racist. We would try to say what was done, and allow the reader to 
make their own judgement." 52 

Daniszewski's statement may sound democratic ("power to the reader!"), but it's 
important to think about whose interests are served by making racism a matter of indi¬ 
vidual opinion. For many people, racism exists as a matter of fact, as we have discussed 
throughout this book. Its existence is supported by the overwhelming empirical evi¬ 
dence that documents instances of structural racism, including wealth gaps, wage gaps, 
and school segregation, as well as health inequities, as we have also discussed. Naming 
these structural forces may be the most effective way to communicate broad context. 
Moreover, as the data journalist in this scenario, it is your responsibility to connect 
the research question to the results and to the audience's interpretation of the results. 
Letting the numbers speak for themselves is emphatically not more ethical or more 
democratic because it often leads to those numbers being misinterpreted or the results 
of the study being lost. Placing numbers in context and naming racism or sexism when 
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it is present in those numbers should be a requirement—not only for feminist data 
communication, but for data communication full stop. 

This counsel—to name racism, sexism, or other forces of oppression when they are 
clearly present in the numbers—particularly applies to designers and data scientists 
from the dominant group with respect to the issue at hand. White people, including 
ourselves, the authors of this book, have a hard time naming and talking about racism. 
Men have a hard time naming and talking about sexism and patriarchy. Straight people 
have a hard time seeing and talking about homophobia and heteronormativity. If you 
are concerned with justice in data communication, or data science more generally, we 
suggest that you practice recognizing, naming, and talking about these structural forces 
of oppression. 53 

But our work as hypothetical anti-oppression visualization designers is not over yet. 
We might have named racism as a structural force in our visualization, but there are 
still two problems with the "good” visualization, and they hinge on the wording of the 
subtitle: People of Color Less Likely to Get Mental Health Diagnosis. The first problem 
is that this is starting to look like a deficit narrative, which we discuss in chapter 2—a 
narrative that reduces a social group to negative stereotypes and fails to portray them 
with creativity and agency. The second issue is that by naming racism and then talking 
about people of color in the title, the graphic reinforces the idea that race is an issue 
for people of color only. If we care about righting the balance of power, the choice of 
words matters as much as the data under analysis. In an op-ed about the language used 
to describe low-income communities, health journalist Kimberly Seals Allers affirms 
this point: "We almost always use a language of deficiency, calling them disadvantaged, 
under-resourced and under-everything else. ... It ignores all the richness those commu¬ 
nities and their young people possess: the wealth of resiliency, tenacity and grit that 
can turn into greatness if properly cultivated." 54 

So let's give it a third try, with the image in figure 6.7. 

In this third version, we have retained the same title as the previous chart. But 
instead of focusing the subtitle on what minoritized groups lack, it focuses on the 
unfair advantages that are given to the dominant group. The subtitle now reads, White 
People Get More Mental Health Services. This avoids propagating a deficit narrative 
that reinforces negative associations and cliches. It also asserts that white people have 
a race, and that they derive an unfair advantage from that race in this case. 55 Finally, 
the title is proposing an interpretation of the numbers that is grounded in the context 
of the researchers' conclusions on health disparities. 
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Racism in Jail 

White people get more mental health diagnoses 



White Black Hispanic Other 


Figure 6.7 

A third portrayal of the same data, with only the framing title and subtitle changed. Source: Data 
from Kaba et al., "Disparities in Mental Health Referral and Diagnosis in the New York City Jail 
Mental Health Service." Graphic by Catherine D'Ignazio. Data from Fatos Kaba et al., "Disparities 
in Mental Health.” 

Restoring Context 

Three iterations on a single chart title might feel excessive, but it also helps to under¬ 
score the larger point that considering context always involves some combination of 
interest and time. Fortunately, there is a lot of energy around issues of context right 
now, and educators, journalists, librarians, computer scientists, and civic data publish¬ 
ers are starting to develop more robust tools and methods for keeping context attached 
to data so that it's easier to include in the end result. 

For example, remember figure 6.3, that confusing chart of government procure¬ 
ments in Sao Paulo that we discussed earlier in this chapter? Gisele Craveiro, a professor 
at the University of Sao Paulo, has created a tool called Cuidando do Meu Bairro (Caring 
for My Neighborhood) to make that spending data more accessible to citizens by add¬ 
ing additional local context to the presentation of the information. 56 In the classroom, 
Heather Krause, a data scientist and educator, has developed the concept of the "data 
biography." 57 Prior to beginning the analysis process, Krause asks people working with 
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data, particularly journalists, to write a short history of a particular dataset and answer 
five basic questions: Where did it come from? Who collected it? When? How was it 
collected? Why was it collected? A related but slightly more technical proposal advo¬ 
cated by researchers at Microsoft is being called datasheets for datasets . 58 Inspired by the 
datasheets that accompany hardware components, computer scientist Timnit Gebru 
and colleagues advocate for data publishers to create short, three- to five-page docu¬ 
ments that accompany datasets and outline how they were created and collected, what 
data might be missing, whether preprocessing was done, and how the dataset will be 
maintained, as well as a discussion of legal and ethical considerations such as whether 
the data collection process complies with privacy laws in the European Union. 59 

Another emerging practice that attempts to better situate data in context is the 
development of data user guides. 60 Bob Gradeck, manager of the Western Pennsylva¬ 
nia Regional Data Center, started writing data user guides because he got the same 
questions over and over again about popular datasets he was managing, like prop¬ 
erty data and 311 resident reports in Pittsburgh. Reports Gradeck, "It took us some 
time to learn tips and tricks. ... I wanted to take the stuff that was in my head and 
put it out there with additional context, so other data users didn't have to do it 
from scratch." 61 Data user guides are simple, written documents that each contain a 
narrative portrait of a dataset. They describe, among other things, the purpose and 
application of the data; the history, format, and standards; the organizational con¬ 
text; other analyses and stories that have used the dataset; and the limitations and 
ethical implications of the dataset. This is similar to the work that data journalists 
are doing to compile datasets and then make them available for reuse. For example, 
the Associated Press makes comprehensive national statistics about school segregation 
in the United States available for purchase. 62 The spreadsheets are accompanied by a 
twenty-page narrative explainer about the data that includes limitations and sample 
story ideas. 

These developments are exciting, but there is further to go with respect to issues of 
power and inequality that affect data collection environments. For example, professor 
of political science Valerie Hudson has worked for decades to trace the links between 
state security and the status of women. "I was interested in whether forms of oppres¬ 
sion or subordination or violence against women were related to national, and perhaps 
international, instability and conflict," she explains. She and geographer Chad Emmett 
started the project WomanStats as a modest Excel spreadsheet in 2001. It has since 
grown to a large-scale web database with over a quarter of a million data points, includ¬ 
ing over 350 variables ranging from access to health care to the prevalence of rape to 
the division of domestic labor. 63 
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Notably, their sources are qualitative as well as quantitative. Says Hudson, "If you 
want to do research on women, you have to embrace qualitative data. There’s no two 
ways about it, because the reality of women's lives is simply not captured in quanti¬ 
tative statistics. Absolutely not." 64 At the present, WomanStats includes two types of 
qualitative variables: practice variables are composed from women's reports of their 
lived experiences, and law variables are coded from the legal frameworks in a particular 
country. Indeed, the WomanStats codebook is a context nerd's dream that outlines 
measurement issues and warns about the incompleteness of its own data, especially 
with respect to difficult topics. 65 In regard to the data that records reports of rape, for 
example—a topic upsetting enough to even consider, let alone contemplate its scale 
and scope in an entire country—the codebook states: "CAVEAT EMPTOR! Users are 
warned that this scale only reflects reported rape rates, and for many, if not most, coun¬ 
tries, this is a completely unreliable indicator of the actual prevalence of rape within a 
society!" 66 Instead of focusing on a single variable, users are directed to WomanStats's 
composite scales, like the Comprehensive Rape Scale, which look at reported preva¬ 
lence in the context of laws, whether laws are enforced, reports from lived experience, 
strength of taboos in that environment, and so on. 

So tools and methods for providing context are being developed and piloted. And 
WomanStats models how context can also include an analysis of unequal social power. 
But if we zoom out of project-level experiments, what remains murky is this: Which 
actors in the data ecosystem are responsible for providing context? 

Is it the end users? In the case of the missing Reddit comments, we see how even 
the most highly educated among us fail to verify the basic claims of their data source. 
And datasheets for datasets and data user guides are great, but can we expect individual 
people and small teams to conduct an in-depth background research project while 
on a deadline and with a limited budget? This places unreasonable expectations and 
responsibility on newcomers and is likely to lead to further high-profile cases of errors 
and ethical breaches. 

So is it the data publishers? In the case of GDELT, we saw how data publishers, in 
their quest for research funding, overstated their capabilities and didn't document the 
limitations of their data. The Reddit comments were a little different: the dataset was 
provided by an individual acting in good faith, but he did not verify—and probably did 
not have the resources to verify—his claim to completeness. In the case of the campus 
sexual assault data, it's the universities who are responsible for self-reporting, and they 
are governed by their own bottom line. 67 The government is under-resourced to verify 
and document all the limitations of the data. 
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Is it the data intermediaries? Intermediaries, who have also been called infomedi- 
aries, might include librarians, journalists, nonprofits, educators, and other public 
information professionals. 68 There are strong traditions of data curation and man¬ 
agement in library science, and librarians are often the human face of databases for 
citizens and residents. But as media scholar Shannon Mattern points out, librarians 
are often left out of conversations about smart cities and civic technology. 69 Examples 
of well-curated, verified and contextualized data from journalism, like the Associated 
Press database on school segregation or other datasets available in ProPublica's data 
store, are also promising. 70 The nonprofit Measures for Justice provides comprehen¬ 
sive and contextualized data on criminal justice and incarceration rates in the United 
States. 71 Some data intermediaries, like Civic Switchboard in Pittsburgh, are build¬ 
ing their own local data ecosystems as a way of working toward sustainability and 
resilience. 72 These intermediaries who clean and contextualize the data for public use 
have potential (and have fewer conflicts of interest), but sustained funding, significant 
capacity-building, and professional norms-setting would need to take place to do this 
at scale. 

Houston, we have a public information problem. Until we invest as much in provid¬ 
ing (and maintaining) context as we do in publishing data, we will end up with public 
information resources that are subpar at best and dangerous at worst. This ends up get¬ 
ting even more thorny as the sheer quantity of digital data complicates the verification, 
provenance, and contextualization work that archivists have traditionally undertaken. 
Context, and the informational infrastructure that it requires, should be a significant 
focus for open data advocates, philanthropic foundations, librarians, researchers, news 
organizations, and regulators in the future. Our data-driven lives depend on it. 

Consider Context 

The sixth principle of data feminism is to consider context. The bottom line for numbers 
is that they cannot speak for themselves. In fact, those of us who work with data must 
actively prevent numbers from speaking for themselves because when those numbers 
derive from a data setting influenced by differentials of power, or by misaligned collec¬ 
tion incentives (read: pretty much all data settings), and especially when the numbers 
have to do with human beings or their behavior, then they run the risk not only of 
being arrogantly grandiose and empirically wrong, but also of doing real harm in their 
reinforcement of an unjust status quo. 

The way through this predicament is by considering context, a process that includes 
understanding the provenance and environment from which the data was collected, 
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as well as working hard to frame context in data communication (i.e., the numbers 
should not speak for themselves in charts any more than they should in spreadsheets). 
It also includes analyzing social power in relation to the data setting. Which power 
imbalances have led to silences in the dataset or data that is missing altogether? Who 
has conflicts of interest that prevent them from being fully transparent about their 
data? Whose knowledge about an issue has been subjugated, and how might we begin 
to recuperate it? The energy around context, metadata, and provenance is impressive, 
but until we fund context, then excellent contextual work will remain the exception 
rather than the norm. 


7 Show Your Work 


Principle: Make Labor Visible 

The work of data science, like all work in the world, is the work of many hands. Data feminism 
makes this labor visible so that it can be recognized and valued. 


If you work in software development, chances are that you have a GitHub account. 
As of June 2018, the online code-management platform had over twenty-eight mil¬ 
lion users worldwide. By allowing users to create web-based repositories of source code 
(among other forms of content) to which project teams of any size can then contribute, 
GitHub makes collaborating on a single piece of software or a website or even a book 
much easier than it has ever been before. 

Well, easier if you're a man. A 2016 study found that female GitHub users were less 
likely to have their contributions accepted if they identified themselves in their user 
profiles as women. (The study did not consider nonbinary genders.) 1 Critics of GitHub's 
commitment to inclusivity, or the lack thereof, also point to the company's internal poli¬ 
tics. In 2014, GitHub's cofounder was forced to resign after allegations of sexual harass¬ 
ment were brought to light. 2 More recently, in 2018, Agnes Pak, a former top attorney 
at GitHub, sued the company for allegedly altering her performance reviews after she 
complained about her gender and race contributing to a lower compensation package, 
giving them the grounds to fire her. 3 Pak's suit came only shortly after transgender soft¬ 
ware developer Coraline Ada Ehmke, in 2017, declined a significant severance package so 
that she could talk publicly about her negative experience of working at GitHub. 4 Clearly, 
GitHub has several major issues of corporate culture that it must address. 

But a corporate culture that is hostile to women does not necessarily preclude other 
feminist interventions. And here GitHub makes one important one: its platform helps 
show the work of writing collaborative code. In addition to basic project management 
tools, like bug tracking and feature requests, the GitHub platform also generates visu¬ 
alizations of each team member's contributions to a project's codebase. Area charts, 
arranged in small multiples, allow viewers to compare the quantity, frequency, and 
duration of any particular member's contributions (figure 7.1a). A line graph reveals 
patterns in the day of the week when those contributions took place (figure 7.1b). And 
a flowchart-like diagram of the relationships between various branches of the project's 
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Figure 7.1 

(a) The first of three visualizations of the code associated with a project from Lauren's research 
group, showing the significant contributions of student researchers between the years 2014 and 
2019. (b) A bar chart shows the frequency of code commits over time, and a line graph shows any 
patterns in the day of the week when the commits were made. Screenshot by Lauren F. Klein, (c) A 
flowchart-like diagram documents the relationships between the various branches of the project's 
codebase. Screenshots by Lauren F. Klein. 
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Figure 7.1 (continued) 

code helps to acknowledge any sources for the project that might otherwise go uncred¬ 
ited, as well as any additional projects that might build upon the project's initial work 
(figure 7.1c). 

Coding is work, as anyone who's ever programmed anything knows well. But it's not 
always work that is easy to see. The same is true for collecting, analyzing, and visual¬ 
izing data. We tend to marvel at the scale and complexity of an interactive visualiza¬ 
tion like the Ship Map, in figure 7.2, which plots the paths of the global merchant fleet 
over the course of the 2012 calendar year. 5 By showing every single sea voyage, the 
Ship Map exposes the networks of waterways that constitute our global product sup¬ 
ply chain. But we are less often exposed to the networks of processes and people that 
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Figure 7.1 (continued) 


help constitute the visualization itself—from the seventy-five corporate researchers at 
Clarksons Research UK who assembled and validated the underlying dataset, to the 
academic research team at University College London's Energy Institute that developed 
the data model, to the design team at Kiln that transformed the data model into the 
visualization that we see. And that is to say nothing of the tens of thousands of com¬ 
mercial ships that served as the source of data in the first place. Visualizations like the 
Ship Map involve the work of many hands. 

Unfortunately, however, when releasing a data product to the public, we tend not 
to credit the many hands who perform this work. We often cite the source of the 
dataset, and the names of the people who designed and implemented the code and 
graphic elements. But we rarely dig deeper to discover who created the data in first 
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Figure 7.2 

A time-based visualization of global shipping routes, designed by Kiln in 2016, based on data from 
the University College London Energy Institute (UCL El). The Ship Map website was created by 
Duncan Clark and Robin Houston from Kiln, and the dataset was compiled by Julia Schaumeier 
and Tristan Smith from the UCL El. The website also includes a soundtrack: Bach's Goldberg 
Variations, played by Kimiko Ishizaka. 
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place, who collected the data and processed them for use, and who else might have 
labored to make creations like the Ship Map possible. Admittedly, this information is 
sometimes hard to find. And when project teams (or individuals) are already operating 
at full capacity, or under budgetary strain, this information can—ironically—simply be 
too much additional work to pursue. 6 Even in cases in which there are both resources 
and desire, information about the range of the contributors to any particular project 
sometimes can't be found at all. But the various difficulties we encounter when trying 
to acknowledge this work reflects a larger problem in what information studies scholar 
Miriam Posner calls our data supply chain. 7 Like the contents of the ships visualized 
on the Ship Map, about which we only know sparse details—the map can tell us if a 
shipping container was loaded onto the boat, but not what the shipping container 
contains—the invisible labor involved in data work, as Posner argues, is something that 
corporations have an interest in keeping out of public view. 

To put it more simply, it's not a coincidence that much of the work that goes into 
designing a data product—visualization, algorithm, model, app—remains invisible and 
uncredited. In our capitalist society, we tend to value work that we can see. This is 
the result of a system in which the cultural worth of any particular form of work is 
directly connected to the price we pay for it; because a service costs money, we rec¬ 
ognize its larger value. But more often than not, the reverse also holds true: we fail to 
recognize the larger value of the services we get for free. When, in the early 1970s, the 
International Feminist Collective launched the Wages for Housework campaign, it was 
this phenomenon of invisible labor —labor that was unpaid and therefore unvalued— 
that the group was trying to expose (figure 7.3). 8 The precise term they used to describe 
this work was reproductive labor, which comes from the classical economic distinction 
between the paid and therefore economically productive labor of the marketplace, and 
the unpaid and therefore economically unproductive labor of everything else. By refram¬ 
ing this latter category of work as reproductive labor, rather than simply (and inac¬ 
curately) unproductive labor, groups like the International Feminist Collective sought 
to emphasize how the range of tasks that the term encompassed, like cooking and 
cleaning and child-rearing, were precisely the tasks that enabled those who performed 
"productive" labor, like office or factory work, to continue to do so. 

The Wages for Housework movement began in Italy and migrated to the United 
States with the help of labor organizer and theorist Silvia Federici. It eventually claimed 
chapters in several American cities, and did important consciousness-raising work. 9 
Still, as prominent feminists like Angela Davis pointed out, while housework might 
have been unpaid for white women, women of color—especially Black women in the 
United States—had long been paid, albeit not well, for their housework in other peo¬ 
ple's homes: "Because of the added intrusion of racism, vast numbers of Black women 
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Figure 7.3 

A Wages for Housework march, 1977. Photograph by Bettye Lane. Courtesy of the Schlesinger 
Library, Radcliffe Institute/Bettye Lane. 

have had to do their own housekeeping and other women's home chores as well." 10 
Here, Davis is making an important point about racialized labor: just as housework is 
structured along the lines of gender, it is also structured along the lines of race and 
class. The domestic labor of women of color was and remains underwaged labor, as femi¬ 
nist labor theorists would call it, and its low cost was what permitted (and continues to 
permit) many white middle- and upper-class women to participate in the more lucra¬ 
tive waged labor market instead. 11 

Since the 1970s, the term invisible labor has come to encompass the various forms 
of labor, unwaged, underwaged, and even waged, that are rendered invisible because 
they take place inside of the home, because they take place out of sight, or because they 
lack physical form altogether. 12 Visit WagesforFacebook.com and you'll find a version 
of the Wages for Housework argument updated for a new form of invisible work. This 
invisible labor can be found all over the web, as digital labor theorists such as Tiziana 
Terranova have helped us to understand. 13 "They call it sharing. We call it stealing," is 
one of the statements that scrolls down the screen in large black type. The word it refers 
to work that most of us perform every day, in the form of our Facebook likes, Instagram 





180 


Chapter 7 


posts, and Twitter tweets. The point made by Laurel Ptak, the artist behind Wages for 
Facebook—a point also made by Terranova—is that the invisible unpaid labor of our 
likes and tweets is precisely what enables the Facebooks and Twitters of the world to 
profit and thrive. 

The Invisible Labor of Data Science 

The world of data science is able to profit and thrive because of unpaid invisible labor 
as well. How did Netflix improve its movie recommendation algorithm? The com¬ 
pany crowdsourced it. 14 How did the Guardian, the British newspaper, determine which 
among two million leaked documents might contain incriminating information about 
government misspending? The paper crowdsourced it. 15 The optical character recogni¬ 
tion (OCR) error correction performed on the dataset of early modern books that you 
downloaded for your text-analysis project? That was crowdsourced, too. 16 

Each of these crowdsourcing projects were framed as acts of benevolence (and, in 
the case of Netflix, an opportunity to win a million-dollar prize). People should want 
to contribute to these projects, their proponents claimed, since their labor would fur¬ 
ther the public good. 17 However, Ashe Dryden, the software developer and diversity 
consultant, points out that people can only help crowdsource if they have the inclina¬ 
tion and the time. 18 Think back to that study of GitHub. If you were a woman and you 
knew your contributions to a programming project were less likely to be accepted than 
if you were a man, would that motivate you to contribute the project? Or, for another 
example, consider Wikipedia. Although the exact gender demographics of Wikipedia 
contributors are unknown, numerous surveys have indicated that those who contrib¬ 
ute content to the crowdsourced encyclopedia are between 84 percent and 91.5 percent 
men. 19 Why? It could be that there, too, edits are less likely to be accepted if they come 
from women editors. 20 It could also be attributed to Wikipedia's exclusionary editing 
culture and technological infrastructure, as science and technology studies (STS) schol¬ 
ars Heather Ford and Judy Wajcman have argued. 21 And there is also reason to go 
back to the housework argument. Dryden cites a 2011 study showing that women in 
twenty-nine countries spend more than twice as much time on household tasks than 
men do, even when controlling for women who hold full-time jobs. 22 The study did 
not consider nonbinary genders or same-sex (or other non-hetero-typical) households. 
But even as a rough estimate, it seems that women simply don't have as much time. 23 

In capitalist societies, it's very often the case that time is money. But it's also impor¬ 
tant to remember to ask whose time is being spent and whose money is being saved. 
The premise behind Amazon's Mechanical Turk—or MTurk, as the crowdsourcing 
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platform is more commonly known—is that data scientists want to save their own 
time and their own bottom line. 24 The MTurk website touts its access to a global mar¬ 
ketplace of "on-demand Workers," who are advertised as being more "scalable and 
cost-effective" than the "time consuming [and] expensive" process of hiring actual 
employees. 25 But the data-entry and data-processing tasks performed by these workers 
earn them less than minimum wage, even as a recent study by the Pew Research Center 
showed that 51 percent of US-based Turkers, as they are known, hold college degrees, 
and 88 percent are below the age of fifty, among other metrics that would otherwise 
rank them among the most desired demographic for salaried employees. 26 This form 
of underwaged work is also increasingly outsourced from the United States to coun¬ 
tries with fewer (or worse) labor laws and fewer (or worse) opportunities for economic 
advancement. A 2010 University of California, Irvine study measured a 20 percent drop 
in the number of US-based Turkers over the eighteen months that it monitored. 27 This 
trend has continued, the real-time MTurk tracker shows. (The gender split, interest¬ 
ingly, has evened out over time.) 

Even at resource-rich companies like Amazon and Google, the work of data entry 
is profoundly undervalued in proportion to the knowledge it helps to create. Andrew 
Norman Wilson's 2011 documentary Workers Leaving the Googleplex (figure 7.4) exposes 
how the workers tasked with scanning the books for the Google Books database are 
hired as a separate but unequal class of employee, with ID cards that restrict their access 
to most of the Google campus and that prevent them from enjoying the company’s 
famed employee perks. 28 (Evidently, working overtime to preserve the world's cultural 
heritage still does not entitle you to a free lunch, let alone a free class on how to cook 
Pad Kee Mao.) 29 

Wilson also observes that Google's book-scanning workers are disproportionately 
women and people of color—a fact that would not surprise the long line of women 
of color scholar-activists, including Angela Davis, Patricia Hill Collins, and Evelyn 
Nakano Glenn, who have insisted that economic oppression be recognized as a vector 
that cuts across the matrix of domination as a whole. Information studies scholar Lilly 
Irani confirms that "today's hierarchy of data labor echoes older gendered, classed, and 
raced technology hierarchies." 30 Here, Irani compares the hierarchy of data labor to 
the hierarchy encountered by the first generation of female computers, like Christine 
Darden, whom we discussed in this book's introduction. 31 But Irani's own research also 
considers contemporary digital labor practices, and in particular, Amazon's Mechanical 
Turk, the people it employs, and the people it exploits. In 2008, Irani and collabora¬ 
tors built a web tool called the Turkopticon, which enabled Turkers to anonymously 
report unfair labor conditions, as well as any additional information that might help 
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Figure 7.4 

Andrew Norman Wilson's Workers Leaving the Googleplex (2011) documents the hidden inequities 
at Google's Mountain View headquarters. Still courtesy of Andrew Norman Wilson. 

them decide whether to accept future tasks. 32 Irani envisioned the Turkopticon as a 
worker-led project. However, the same unfair labor conditions that necessitated the 
tool also ultimately limited its reach. In 2018, after ten years of service, its all-volunteer 
team of moderators called it quits. "We're all burned out," they wrote on Twitter. 33 And 
amid tagging images and correcting error-laden text, no additional Turkers could find 
the time. 

The people who perform this cultural data work, as Irani terms it, are not only found 
on the MTurk platform, however. They're also increasingly the people on whom the 
entire information economy depends. Cultural data workers are responsible for the 
invisible labor involved in moderating the veritable deluge of content produced online 
every day, ensuring that your Facebook feed is free of, for example, child pornography 
and violent propaganda videos. When a 2014 expose in Wired magazine documented 
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the emotional costs of this labor, performed by some of the least empowered of these 
workers—women in the Global South—it was met with an outpouring of shock and 
outrage. 34 But subsequent studies like Ghost Work, by anthropologist Mary Gray and 
computer scientist Siddharth Suri, have documented the existence of a large "global 
underclass" performing this work of content moderation, transcription, and caption¬ 
ing. 35 They make the point that the so-called automation of artificial intelligence relies 
on a vast number of human beings in the loop. 36 Moreover, while the demographics 
of Silicon Valley tech workers remain steadily young, white, and male, these global 
"ghost workers" are often older women of color, and always required to accept precari¬ 
ous labor conditions. 

Those who study the human costs of global capitalism would be quick to point out 
that this exploitation of precarious, racialized, colonial labor has a long history, one 
that has its roots in the original form of human exploitation: slavery. Slavery and capi¬ 
talism are closely connected, after all, and one infamous story is often told to illustrate 
this point: in 1781, the British slave ship Zong made a series of navigational errors 
while crossing the Atlantic, resulting in a shortage of drinking water for the seventeen 
crew members and 133 captives on board. 37 After performing a cost-benefit analysis, 
the captain decided to throw the crew's enslaved human "cargo" overboard so that the 
crew members could consume all of the remaining water and rations themselves. The 
decision was made because the captain calculated that he could collect enough insur¬ 
ance money on his captives’ loss of life to come out ahead, even if he couldn't sell them 
once they landed ashore. He was thinking about human lives solely in terms of their 
market value—the notion that capitalism holds in highest regard. 

The stark inhumanity of this calculation has prompted numerous scholars and artists 
to return to the Zong as they reckon with what Christina Sharpe calls, in the language 
of the ship, the "wake" of slavery. 38 Poet M. NourbeSe Philip, for example, composed 
a book-length poem, Zong!, using only the words of the legal case that serves as the 
sole documentation of the original event. 39 Written over the course of many years and 
published in 2011, Philip's poem plucks words and short phrases out of the language 
of the court case, arranging them across the printed page. Philip's poem regularly shifts 
tenses from past to present, and from present to past, lending an additional voice to 
Sharpe's claim that the effects of that originary crime—the exploitation of Black bodies 
for white financial gain—are far from resolved. 40 

Our present technological infrastructure follows this same pattern of exploitation. 
In the United States, the scarcely paid or altogether unpaid labor of those who endure 
a contemporary form of enslavement—incarceration—has been used for everything 
from packaging Windows software to cleaning up the 2010 BP oil spill. 41 In a global 
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colonial context, we might consider how the cobalt required to produce the lithium- 
ion batteries that power our cell phones and laptops is associated with significant 
human rights violations, including coercing labor from Congolese children as young 
as seven. 42 The unregulated disposal of electronics has resulted in roadside salvage and 
repair shops in places like Agbogbloshie, just outside of Accra, Ghana, which have long 
served as sites of invention and ingenuity, being transformed into toxic "e-waste" sites, 
with profound consequences for the health of those who live and work there, as well 
as for the environment. 43 The humanitarian and ecological stakes of our attachments 
to data and technology cannot be higher, nor can their source be any more clear: the 
capitalist and colonial forces that encourage the exploitation of Black and brown bod¬ 
ies so that white bodies can thrive. 44 

Examining Data Production 

The forces of global capitalism can feel overwhelming. And as people who use data and 
technology in our everyday work, we are each complicit to varying degrees. But there 
are certain small things we can do in our work, and in our work with others, to push 
back against this weight. In prior chapters, we have described some of these possibili¬ 
ties: incorporating an examination of power into a data analysis project (chapters 1 and 
2); pushing back against false binaries and hierarchies (chapter 4); including multiple 
and marginalized voices in the design process (chapter 5); and contextualizing data so 
that they are not imagined to "speak for themselves" (chapter 6). 

Along with these starting points, we can also begin to carve out additional space for 
the scholars, journalists, and other researchers who are explicitly studying the labor of 
data science—those who are examining and challenging power by tracing visualiza¬ 
tions and algorithms and bots back to their human and material sources. This growing 
area of research might be called data production studies, borrowing a rubric from the 
field of production studies that currently sits at the intersection of film and media 
studies and labor studies. The primary focus of production studies, as it relates to film 
and media, is how media artifacts are produced. Media studies scholar Miranda Banks 
has asserted that "production studies is a feminist methodology" because it pays par¬ 
ticular attention to the power differentials involved in the media production process, 
as well as the material conditions of media workers. 45 Work focused on data production 
is already happening in fields like STS, the digital humanities, library and informa¬ 
tion science, and archival studies, among others. 46 It looks at the production process 
of datasets, algorithms, and models, and traces those products back to the people and 
conditions that enabled their creation. 
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As an example of work in this emerging area, we might consider "Anatomy of an AI 
System," a project by technology researcher Kate Crawford and design scholar Vladlan 
Joler that seeks to describe and diagram the human labor, data dependencies, and mate¬ 
rial resources that contribute to a single Amazon Echo. The project was published online 
as a diagram of Borgesian proportions, too big to view in its entirety on a standard laptop 
screen (figure 7.5a); it was accompanied by a nine-thousand-word essay. Viewers are first 
introduced to the mineral extraction required to produce the electronics components 
for the device and made aware of the hard labor (and sometimes child labor) this task 
requires. The chart (and narrative) proceeds through processes of refining, assembling, 
and distributing these components, then transporting them physically, then transport¬ 
ing them virtually—through the infrastructure of the internet. Once within the Ama¬ 
zon corporate boundary, the chart depicts the layers of workers who provide everything 
from network maintenance to training datasets (figure 7.5b). Crawford and Joler also 
diagram patterns in the organization of Amazon's labor force, which they describe in 
terms of "fractal chains of production and exploitation.” But what is required for this 
replication is people: "At every level contemporary technology is deeply rooted in and 
running on the exploitation of human bodies," the essay concludes. 47 

"Anatomy of an AI System" is an investigation and expose of the invisible labor 
involved in making a single product on a global scale. In this way, it is an ambitious 
example of the seventh principle of data feminism: show the work. Behind the magic 
and marketing of data products, there is always hidden labor—often performed by 
women and people of color, which is both a cause and effect of the fact that this labor 
is both underwaged and undervalued. Data feminism seeks to make this labor visible 
so that it can be acknowledged and appropriately valued, and so that its truer cost—for 
people and for the planet—can be recognized. 

Crediting Data Work 

The emphasis on giving formal credit for a broad range of work derives from feminist 
practices of citation. Feminist theorist Sara Ahmed describes this practice as a way of 
resisting how certain types of people—usually cis and white and male—"take up spaces 
by screening out others." 48 When those other people are screened out, they become 


Figure 7.5 (following three pages) 

Overview (a) and detail (b) of "Anatomy of an AI System" (2018)—a diagram and essay by Kate 
Crawford and Vladan Joler that attempts to chart all of the human labor, data, and planetary 
resources used to create an Amazon Echo device. Courtesy of Kate Crawford and Vladan Joler. 
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Figure 7.5 (continued) 


invisible, and their contributions go unrecognized. The screening techniques that lead to 
their erasure, as Ahmed terms them, are not always intentional, but they are, unfortu¬ 
nately, self-perpetuating. Ahmed gives the example of sinking into a leather armchair 
that is comfortable because it's molded to the shape of your body over time. You prob¬ 
ably wouldn't notice how the chair would be uncomfortable for those who haven't 
spent time sitting in it—those with different bodies or with different demands on 
their time. Which is why those of us who occupy those comfortable leather seats—or, 
more likely in the design world, molded plastic Eames chairs—must remain vigilant in 
reminding ourselves of the additional forms of labor, and the additional people, that 
our own data work rests upon. 

This gets complicated quickly even on the scale of a single data science project. The 
names of all the people and the work they perform are not always easy to locate—if 
they can be located at all. But taking steps to document all the people who work on a 
particular project at the time that it is taking place can help to ensure that a record of 
that work remains after the project has been completed. In fact, this is among the four 
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core principles that comprise the Collaborators’ Bill of Rights, a document developed 
by an interdisciplinary team of librarians, staff technologists, scholars, and postdoctoral 
fellows in 2011 in response to the proliferation of types of positions, at widely divergent 
ranks, that were being asked to contribute to data-based (and other digital) projects. 49 

When designing data products from a feminist perspective, we must similarly aspire 
to show the work involved in the entire lifecycle of the project. This remains true even 
as it can be difficult to name each individual involved or when the work may be collec¬ 
tive in nature and not able to be attributed to a single source. In these cases, we might 
take inspiration from the Next System Project, a research group aimed at document¬ 
ing and visualizing alternative economic systems. 50 In one report, the group compiled 
information on the diversity of community economies operating in locations as far- 
ranging as Negros Island, in the Philippines; Quebec province, in Canada; and the state 
of Kerala, in India. The report employs the visual metaphor of an iceberg (figure 7.6), in 
which wage labor is positioned at the tip of the iceberg, floating above the water, while 
dozens of other forms of labor—informal lending, consumer cooperatives, and work 
within families, among others—are positioned below the water, providing essential 
economic ballast but remaining out of sight. 

With the idea of underwater labor in mind, we might return to the example of 
GitHub, which began this chapter, to ask what additional forms of labor might contrib¬ 
ute to the production of code but cannot be represented by the visualization scheme 
that GitHub currently employs. We might think of the work of the project manager, 
which is not directly expressed in a particular number or size or frequency of contri¬ 
butions, but nevertheless ensures the quality and consistency of all project code. We 
might wonder about the work of the designer on a project or of the technical writer— 
both of whom might have helped to shape the project in its initial phases, but who 
have likely moved on to other tasks. In the case of a consumer-facing project, we might 
also consider the contributions of the customer support teams. Or in a community- 
oriented project, we might include organizers who have spent years developing strong 
relationships with community members. These forms of labor, both productive and 
reproductive, are essential to the success of any project but are not currently rendered 
visible, nor could they ever be easily visualized, by a scheme that considers project 
contributions to consist of code alone. 51 

But in more instances than you might think, the labor associated with data work can 
be surfaced through the data themselves. For instance, historian Benjamin Schmidt, 
whose research centers on the role of government agencies in shaping public knowl¬ 
edge, decided to visualize the metadata associated with the digital catalog of the US 
Library of Congress, the largest library in the world (figure 7.7). 52 Schmidt's initial 
goal was to understand the collection and the classification system that structured the 



Figure 7.6 

The "Diverse Economies Iceberg" (2017), a diagram of multiple labor practices created by the 
Next System Project for a report on cultivating community economies. Image courtesy of J. K. 
Gibson-Graham, Jenny Cameron, Kelly Dombrowski, Stephen Healy, and Ethan Miller for the 
Next System Project. 
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Figure 7.7 

"A Brief Visual History of MARC Cataloging at the Library of Congress" (2017) visualizes when 
books at the Library of Congress entered their digital catalog. Image courtesy of Benjamin M. 
Schmidt. 
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catalog. But in the process of visualizing the catalog records, he discovered something 
else: a record of the labor of the cataloguers themselves. When he plotted the year that 
each book's record was created against the year that the book was published, he saw 
some unusual patterns in the image: shaded vertical lines, step-like structures, and dark 
vertical bands that didn't match up with what one might otherwise assume would be a 
basic two-step process of (1) acquire a book and (2) enter it in. 

The shaded vertical lines, Schmidt soon realized, showed the point at which the 
cataloguers began to turn back to the books that had been published before the library 
went digital, filling in the online catalogue with older books. The step-like patterns 
indicated the periods of time, later in the process, when the cataloguers returned to 
specific subcollections of the library, entering in the data for the entire set of books 
in a short period of time. And the horizontal lines? Well, given that they appear only 
in the years 1800 and 1900, Schmidt inferred that they indicated missing publication 
information, as best practices for library cataloguing dictate that the first year of the 
century be entered when the exact publication date is unknown. 

With an emphasis on showing the work, these visual artifacts should also prompt 
us to consider just how much physical work was involved in converting the library’s 
paper records to digital form. The darker areas of the chart don't just indicate a larger 
number of books entered into the catalog, after all. They also indicate the people who 
typed them all in. (Schmidt estimates the total number of records at ten million and 
growing.) Similarly, the step-like formations don't just indicate a higher volume of data 
entry. They indicate strategic decisions made by library staff to return to specific parts 
of the collection and reflect those staff members' prior knowledge of the gaps that 
needed to be filled—in other words, their intellectual labor as well. Schmidt's visual¬ 
ization helps to show how the dataset always points back to the data setting —to use 
Yanni Loukissas's helpful phrase—as well as to the people who labored in that setting 
to produce the data that we see. 53 

Crediting Emotional Labor and Care Work 

In addition to the invisible labor of data work, there is also labor that remains hid¬ 
den because we are not trained to think of it as labor at all. This is what is known 
as emotional labor, and it’s another form of work that feminist theory has helped to 
bring to light. 54 As described by feminist sociologist Arlie Hochschild, emotional labor 
describes the work involved in managing one's feelings, or someone else's, in response 
to the demands of society or a particular job. 55 Hochschild coined the term in the 
late 1970s to describe the labor required of service industry workers, such as flight 
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attendants, who are required to manage their own fear while also calming passengers 
during adverse flight conditions, and generally work to ensure that flight passengers 
feel cared for and content. In the decades that followed, the notion of emotional labor 
was supplemented by a related concept, affective labor, so that the work of projecting a 
feeling (the definition of emotion) could be distinguished from the work of experienc¬ 
ing the feeling itself (the definition of affect). 56 

We can see both emotional and affective labor at work all across the technology 
industry today. Consider, for instance, how call center workers and other technical 
support specialists must exert a combination of affective and emotional labor, as well 
as technical expertise, to absorb the rage of irate customers (affective labor), reflect back 
their sympathy (emotional labor), and then help them with—for instance—the con¬ 
figuration of their wireless router (technical expertise). 57 In the workplace, we might 
also consider the affective labor required by women and minoritized groups, in all situ¬ 
ations, who must take steps to disprove (or simply ignore) the sexist, racist, or otherist 
assumptions they face—about their technical ability or about anything else. And they 
must do so while also performing the emotional labor that ensures that they do not 
threaten those who hold those assumptions, who often also hold positions of power 
over them. 58 Are there ways to visualize these forms of labor, giving visual presence— 
and therefore acknowledgement and credit—to these outlays of work? 

One example that strives to visualize emotional and affective labor is the Atlas of 
Caregiving (figure 7.8), an ongoing project that aims to document the work involved 
in caring for a chronically ill family member. The project's name plays on the concept 
of the anatomy atlas, a compendium of illustrations of the human body that doctors 
can consult for information and reference. In this case, the goal was to illustrate the 
sometimes physical and sometimes emotional or affective work of care. The research 
team outfitted its participants with a variety of biometric sensors, including acceler¬ 
ometers and heart rate monitors, as well as with body cameras programmed to take a 
picture every fifteen minutes. They then visualized these data alongside excerpts from 
personal interviews and from the activity logs they asked the caregivers in the study 
to complete. 

The result is a complex picture of caregiving, one that marshals data in the interest 
of creating a comprehensive view of the range of labor involved in caregiving work. 59 
The stress of serving as a caregiver—a form of affective labor—is broken down into six 
distinct levels, and then visualized as a gradient (figure 7.8a). The work of caregiving 
itself is divided into seven subtypes of work, including concrete tasks like healthcare 
management and household chores, and more abstract forms of labor like being avail¬ 
able and social support (figure 7.8b). This, too, helps others recognize the wide range 
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Figure 7.8 

The Atlas of Caregiving visualizes the labor of caring for chronically ill family members, (a) A 
thirty-six-hour log of caregiving activities; (b) caregiving activities separated by type; (c) a photo 
log created during that same time. Image courtesy of the Atlas of Caregiving, 2016. 
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Figure 7.8 (continued) 


of work—indeed, expertise—associated with caregiving. And as some of the study's 
participants reported, it helped them to recognize that work for themselves. 60 

Of course, a diagram of work is only a proxy for the work itself—and that is to say 
nothing about the complexity of human feelings. This understanding served as the gen¬ 
esis for "Bruises—the Data we Don't See." 61 This artful visualization, created by designer 
Giogia Lupi and accompanied by a musical score composed by Kaki King, attempts to 
get closer to a visual representation of the emotional toll of caregiving (figure 7.9). The 
project began when King's daughter was diagnosed with a rare autoimmune disease, 
idiopathic thrombocytopenic purpura (ITP). ITP is described as a "very visual disease," 
and presents as bruises and burst blood vessels all over the body. For this reason, King 
was instructed to watch her daughter's skin and record any significant changes. She 
also recorded her own feelings in terms of hope, stress, and fear, creating subjective 
data to complement the hard numbers she received from the blood tests her daughter 
was required to endure. 

When Lupi, who knew King from previous collaborations, set out to design her 
visualization, her goal was to "evoke empathy" and help her audience "feel a part of 


Figure 7.9 (following three pages) 

Still from Bruises—the Data We Don't See (2018) and the legend that helps decode the data visual¬ 
ization. Image courtesy of Giorgia Lupi and Kaki King. 
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Figure 7.9 (continued) 


a story of a human's life." 62 In contrast to the Atlas of Caregiving, which relies upon 
standard visualization techniques like radial timelines and Gantt-style charts to legiti¬ 
mate the work of care, Lupi sought alternative visualization strategies to emphasize the 
particularity and specificity of a single family’s situation. She employed a fluid timeline 
to reflect the subjective nature of what disability studies scholars call crip time. With 
this term, as Ellen Samuels explains it, "Sometimes we just mean that we're late all the 
time—maybe because we need more sleep than nondisabled people, maybe because the 
accessible gate in the train station was locked." 63 But it can also mean something more 
profound, as described by Alison Kafer: "Rather than bend disabled bodies and minds 
to meet the clock, crip time bends the clock to meet disabled bodies and minds." 64 

In Lupi's depiction of how the clock met King's daughter's body and King's own 
mind, days became white aspen-shaped leaves, segmented not by weeks or years but by 
hospital visits. Red dots were employed to indicate platelet counts, with color deployed 
mimetically to convey the intensity of the bruises, as well as the visuality of the data 
recorded by King. Lupi also employed color to represent King's record of her feelings, 
with black corresponding to stress and fear and yellow to signify hope. King's fear 
and hope were also visualized by hand-drawn lines that reflected each on a scale of 
one to ten. The result is rendered as an animation that unfolds over time and is set to 
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music, a visually and aurally affecting composition of the affective labor of mothering 
and care. 

Of course, neither Lupi and King nor the Atlas of Caregiving project team are the 
first to want to identify and make visible the work of care. As early as 1969, shortly after 
the birth of her own child, artist Mierle Laderman Ukeles penned the Manifesto for 
Maintenance Art, which called on the art world to elevate the care and maintenance 
of human life to an art, over and above the solitary creative (male) genius. 65 In the 
years that followed, care work would become a significant topic of interest for feminist 
scholars—especially after the mid-1990s, when Nancy Folbre formalized the term. Fol- 
bre's primary model of care work was the everyday work of caring for a child, although 
care work, like housework, isn't necessarily performed for free. It can also include the 
underwaged work performed by daycare workers or home health aids, as well as the 
waged work of doctors, nurses, physical therapists, mental health professionals, and so 
on. What binds these forms of work together across economic lines is their motivation. 
As theorized by Folbre, care work is undertaken out of a sense of compassion or respon¬ 
sibility for others, rather than with a goal of monetary gain. But when it comes to the 
market, altruism is a double-edged sword. These same professional care workers—who 
are predominantly women and people of color—are often paid less than they would be 
in other fields. 66 Why? Because they care. 

So how do we "show the work" of care workers? How do we ensure that this work 
is sufficiently recognized and valued? And can we do anything more to challenge the 
root cause of this undervalued work? In the academy, groups like the Maintainers have 
sought to learn from the theories of care developed by feminist labor studies scholars 
such as Folbre as they attempt to make visible and value the labor of data work. 67 
Through workshops, conferences, and publications, the Maintainers seek to counter 
the current tendency in technology fields to celebrate innovation and discovery alone. 
The work that maintains and sustains the world we live in today should also be cel¬ 
ebrated, they insist. Among their current areas of research are the people they call Info- 
Maintainers: the people who work in libraries and archives and in related preservation 
fields to ensure that the knowledge of the present remains accessible for generations to 
come. Because the work of librarians and archivists and curators is focused on facilitat¬ 
ing access to future knowledge, the Maintainers argue, it can be viewed as a form of 
care work too. 

Across many technical fields, there is an increasing amount of attention paid to 
care work, and to other forms of invisible labor, now that so much work is virtual 
rather than physical; as well as to issues of job insecurity now that white-collar jobs 
have begun to be outsourced to freelancers as well. In this context, it is important to 
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recall that professional care workers have long dealt with issues of undercompensated 
and precarious work; and for just as long, they have been involved with efforts to 
resist and organize against the inequities they have faced. Today, these efforts are being 
enhanced by data and technology, as unions and other advocacy groups are making use 
of new platforms and data streams for their work. But they are also being obstructed, 
as Uber-style apps to connect caregivers and employers increasingly abound. These 
apps do nothing to solve the systemic problems that caregivers face. A 2016 study of 
on-demand domestic worker apps by the UK's Overseas Development Institute (ODI) 
reports that because they displace risk onto workers, these platforms potentially rein¬ 
force discrimination and "further entrenchment of unequal power relations within the 
traditional domestic work sector." 68 

As a corrective, we might look to emerging prototypes that center the needs of work¬ 
ers, those that are developed by and with workers themselves. In the US, for example, 
the National Domestic Workers Alliance (NDWA) has developed an app, Alia, in order 
to serve as a portable benefits platform. 69 It allows clients to contribute a small amount 
into the worker's benefits account each time that worker provides them with a ser¬ 
vice. Workers can then pool contributions from multiple clients to purchase benefits 
on-demand, such as paid time off and various forms of insurance. Caveats remain, of 
course: Shouldn't the government require that all workers receive paid time off as a 
matter of course? Shouldn't we be advocating for a single-payer healthcare system? Yes 
and yes. But while the NDWA continues to lobby for systemic change, its app offers one 
way to provide essential benefits to domestic workers right now. It is a harm reduction 
strategy, one that can be pursued while simultaneously advocating for more transfor¬ 
mative change. Thinking back to Kimberly Seals Allers's app, Irth, discussed in chapter 
1, we might also begin to imagine how its successful use would contribute to a dataset 
that could be used to support future advocacy efforts. 

Show Your Work 

Data work is part of a larger ecology of knowledge, one that must be both sustain¬ 
able and socially just. Like the ship paths visualized on the Ship Map or the source 
code stored on GitHub or the global assemblage of people and materials that make an 
Amazon Echo device, the network of people who contribute to data projects is vast 
and complex. Showing this work is an essential component of data feminism, and it 
is the reason why "show your work" is the seventh and final principle in this book. 
An emphasis on labor opens the door to the interdisciplinary area of data produc¬ 
tion studies: taking a data visualization, model, or product and tracing it back to its 
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material conditions and contexts, as well as to the quality and character of the work 
and the people required to make it. This kind of careful excavation can be undertaken 
in academic, journalistic, or general contexts, in all cases helping to make more clearly 
visible—and therefore to value—the work that data science rests upon. 

We can also look to the data themselves in order to honor the range of forms of 
invisible labor involved in data science. Who is credited on each project? Whose work 
has been "screened out"? While one strategy is to show the work behind making data 
products themselves, another strategy for honoring work of all forms is to use data 
science to show the work of people (mostly women) who labor in other sectors of the 
economy, those that involve emotional labor, domestic work, and care work. We see 
this in action in the Atlas of Caregivers, which focuses on legitimizing care work, and 
the Alia app, which provides more financial security for domestic workers. Designing 
in solidarity with domestic workers can begin to challenge the structural inequalities 
that relegate their work to the margins in the first place. 

This point brings us back to the ideas about power that began this book. Power 
imbalances are everywhere in data science: in our datasets, in our data products, and in 
the environments that enable our data work. Showing the work is crucial to ensure that 
undervalued and invisible labor receives the credit it deserves, as well as to understand 
the true cost and planetary consequences of data work. 
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On November 1, 2018, at 11:10 a.m. local time, workers at Google offices in fifty cities 
around the world closed their browser tabs, shut their laptops, and walked off their 
jobs. 1 The walkout included both full-time employees and freelancers. It was women- 
led at a company that, despite years of lip-service to inclusion, only has 31 percent 
women employees. 2 And it was massive—more than twenty thousand workers par¬ 
ticipated (figure 8.1). Why did workers at one of the most powerful companies on the 
planet take to the streets? 

One week earlier, the New York Times broke a story about the $90 million exit pack¬ 
age that Andy Rubin, the creator of Google’s Android mobile operating system, had 
received after he was accused of sexual misconduct (and after an internal investiga¬ 
tion had found the claim to be credible). 3 The story mentioned two other executives 
accused of sexual misconduct whom Google had similarly protected. As journalists 
Daisuke Wakabayashi and Katie Benner wrote, "In settling on terms favorable to two 
of the men, Google protected its own interests.” Evidently, Rubin's package had been 
paid out in installments of $2 million per month over the course of four years. His final 
payment was scheduled for later that month. 

As soon as the New York Times article was published, additional stories of discrimi¬ 
nation faced by women, as well as men and nonbinary people, began pouring out on 
company email lists and chat channels and in face-to-face forums. The stories pointed 
to patterns of toxic behavior. 4 Within a week, the massive walkout—initially floated as 
an idea on a Google moms list—had been planned. "Tech industry business as usual 
is failing us," said Meredith Whittaker, the founder of Google's Open Research Group. 
"Google paying $90M to Andy Rubin is one example among thousands, which speak 
to a company where abuse of power, systemic racism, and unaccountable decision¬ 
making are the norm. ... It's clear that we need real structural change, not adjustments 
to the status quo.” 5 
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Figure 8.1 

The Sunnyvale, California, Google campus during the Google Walkout for Real Change on 
November 1, 2018. Employees turned out en masse to protest the company's handling of sexual 
misconduct cases. Courtesy of Wikimedia Commons user Grendelkhan. 


A group of seven core organizers, including Whittaker, came together to craft five 
concrete demands, including ending forced arbitration in cases of discrimination and 
sexual harassment and promoting the chief diversity officer to report directly to the 
CEO. 6 When November 1 arrived, employees congregated first in indoor atriums, and 
then in courtyards and on streets. They carried signs that said, "Not OK, Google," "I 
Reported, He Got Promoted," and "Happy to Quit for $90 Million, No Sexual Harass¬ 
ment Required.” Google management started paying attention. 

Although the Google Walkout for Real Change, as the protest was formally known, 
was framed in the media as a milestone for big tech, there are clear precedents for white- 
collar tech organizing. Historian Mar Hicks has connected the Google walkout to a strike 
among computer workers—then a workforce that was comprised mainly of women— 
that took place in the United Kingdom in the 1970s. The strike took down twenty-six 
government computer centers and disrupted the work of nine others. These were the 
centers that enabled the government to process its value-added tax (VAT), and without 
the computers online, the tax couldn't be collected. The government was required to 
pay attention. Writes Hicks, "Even though many of these workers were women, and 
limited in their pay, promotion, and work opportunities due to sexism, their proximity 
to the literal machinery of government gave them a great deal of power." 7 

The organizers of the Google walkout recognized their proximity to another source 
of power: Google itself. A single worker might have limited power, but their collective 
organizing drew attention to the proximity of a relatively small number of people— 
Google employees—to the global digital infrastructure of everyday life. Part of the rea¬ 
son that data and computation have proved to be so lucrative is their ability to scale. 
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As journalist Moira Weigel points out, "This kind of scale means these companies can 
make extraordinarily high profits. But it also means the core workers they rely on have 
an extraordinary amount of bargaining power." 8 They also have messaging power, 
interruption power, and subversion power. 

How might tech workers marshal these strengths to mass-occupy digital infrastruc¬ 
ture? To teach algorithms to "work to rule" in the style of assembly-line slow-downs? 
To slow the flow of everyday capitalism to gather attention? To channel digital solidari¬ 
ties back into physical spaces and human relationships? 

There are already many examples that point to how these questions might begin 
to be answered. In an article about tech organizing in the magazine n+1, for instance, 
an anonymous software developer points out, "If the developers from Slack decided to 
strike, they could, without too much difficulty, push out a change that made it so that 
any message that got sent would push a message about the purpose of the strike" to its 
ten million daily users. 9 And just for a minute, imagine if they did. 

Although Slack developers haven't hacked their own platform (yet), collective orga¬ 
nizing around data and technology has already taken a range of powerful forms. Groups 
like the Tech Workers Coalition are building bridges between the programmers who 
code the search engines and the cafeteria workers who prepare their food. They have 
also helped popularize the hashtag #TechWontBuildIt to indicate a collective refusal 
to work on ethically compromised software. 10 Platforms like Coworker.org are helping 
gig-economy workers, like Uber drivers, get organized. Other organizations, such as 
Tech Solidarity, are focusing on electoral politics. Some projects are taking explicitly 
political stands; the Lerna JavaScript library briefly added a clause to its license prohib¬ 
iting entities that collaborate with US Immigration and Customs Enforcement (ICE) 
from using it. 11 Individuals are forming worker-owned tech cooperatives in the United 
States and around the globe and drafting values statements, such as the Design Action 
Collective's Points of Unity, that guide their work together and help them decide which 
projects to take on. 12 Other collective organizing efforts are working to draft codes of 
ethics like the Toronto Declaration 13 and statements of values like those guiding the 
Canadian government's action plan for open government. 14 Note that these efforts are 
not limited to white-collar workers, nor to employees of the big five technology com¬ 
panies, nor to large-scale events. 

Some groups are using movement-building strategies to effect change across entire 
industries. For example, Una Lee, Wesley Taylor, Victoria Barnett, Ebony Dumas, Car¬ 
los (L05) Garcia, and Sasha Costanza-Chock are coordinating a networked community 
of practice called design justice . 15 The idea for design justice emerged from a workshop 
at the Allied Media Conference in Detroit in 2015, where thirty people assembled to 
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challenge the idea of "design for good." As co-organizer Una Lee put it, "How could 
we redesign design so that those who are normally marginalized by it, those who are 
characterized as passive beneficiaries of design thinking, become co-creators of solu¬ 
tions, of futures?" 16 

Since then, the design justice group has produced dozens of workshops, pop-up 
educational forums, and scholarly texts. One of its central projects is a set of ten Design 
Justice Network Principles, which guide designers in navigating inequality and achiev¬ 
ing justice through design. 17 Principle 1, for example, reads: "We use design to sustain, 
heal, and empower our communities, as well as to seek liberation from exploitative and 
oppressive systems." Principle 5 reads: "We see the role of the designer as a facilitator 
rather than an expert." The Design Justice Network promotes these principles through 
its workshops and other events at which designers meet, discuss, and co-conspire. To 
date, more than 350 additional designers have signed on. 

Data for Black Lives (D4BL) is another example of inspired organizing and move¬ 
ment building at a national scale. Founded by veteran organizer Yeshimabeit Milner, 
who was herself trained by Black Lives Matter organizers, D4BL is "a network [of] over 
4,000 scientists and activists working to harness the power of data and technology to 
make real change in the lives of Black people.” 18 D4BL organizes annual conferences, 
runs online communities, and helps connect people in its network. The group pursues 
two simultaneous strategies: pushing back against the harmful impacts of data as they 
are currently deployed, and creating new spaces for organizers, data scientists, and 
engineers to come together to generate meaningful research questions. The group’s 
emphasis on abolition and liberation, rather than a generic form of social good, leads it 
to design projects that actively work to overturn the data-driven discrimination experi¬ 
enced in Black communities. Milner's vision is "to make data a tool for profound social 
change instead of a weapon of oppression." 19 

The vision of D4BL will take time to realize, as is true of all visions that motivate 
transformative work. The organizers of the Google Walkout for Real Change are discov¬ 
ering this as we write. When they first assembled, they envisioned a world in which 
executives would listen to the demands of their workers and would undertake immedi¬ 
ate measures for change. Although Google publicly expressed support for the workers 
involved, and the CEO issued a memo that read, "We are taking in all their feedback so 
we can turn these ideas into action," that action has yet to transpire. Claire Stapleton, 
the woman who originally floated the idea of taking mass action, stated: "We're almost 
three months out from the walkout and exactly zero of the five demands have been 
met." The corporation did end forced arbitration—and it led to other tech companies 
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doing the same—but it was only a partial win because it covered cases of sexual miscon¬ 
duct alone, not all discrimination cases. As Amr Gaber, another key organizer, added, 
"it's also the cheapest thing, the most minor thing they could've done.'' 20 

These paltry actions, clearly motivated by the bottom line, underscore the unyield¬ 
ing influence of profit and power and the need for a feminism that is intersectional as 
a matter of course. In this book, we have described intersectional feminism —a vibrant 
body of knowledge and action that challenges the unequal distribution of power—and 
how it can be applied to the field of data science today. In doing this work, we have 
drawn heavily from the work of Black feminist theorists and activists, to reflect both 
their central role in defining and elaborating intersectionality and our own position as 
white scholars and white women in the United States. Here, we want to reiterate our 
appreciation for this foundational work, as well as to once again acknowledge that we 
cannot speak directly from the life experiences that motivate it. We hope that you, our 
readers, will use this work in order to reflect on your own identities, as well as to exam¬ 
ine how power and privilege operate in data science and in the world.. 

As we write this conclusion, in July 2019, issues of power and privilege continue to 
loom large. The other four and a half demands issued by the organizers of the Google 
walkout included "a commitment to end pay and opportunity inequity” at all levels of 
the corporation and the collection of "transparent data on the gender, race and ethnic¬ 
ity compensation gap, across both level and years of experience," as well as access to 
extant sexual misconduct reporting mechanisms by all Google employees, including 
its contract workers (who make up around half of the company's total employees). 21 
Yet the public memo stated blandly that Google would continue to work on "creating 
a more inclusive culture for everyone." 22 Meanwhile, Google's lawyers have been filing 
legal documents that urge the US National Labor Relations Board to overturn the 2014 
ruling that allows workers to use company email to organize without fear of retalia¬ 
tion. 23 If the ruling were overturned, it would seriously impede any efforts to organize 
future actions at Google, or at any large corporation, because company email lists are 
the primary way that a distributed workforce can organize across office locations and 
time zones. 

Further complicating future organizing efforts, numerous Google employees, includ¬ 
ing lead organizers Whittaker and Stapleton, have faced retaliation and even demotion 
in the months following the walkout. These internal actions have been documented 
by Wired magazine, Bloomberg News, and the tech news site Packt, among other news 
outlets. For example, Stapleton was told to go on medical leave even though she was 
not sick, and the decision was only reversed after she hired a lawyer; and Whittaker 
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was told that she would be required to "abandon her work” with the AI Now Insti¬ 
tute, an independent research group focused on issues of AI and ethics. 24 Stapleton left 
Google in June 2019 and Whittaker left in July of that year, two high-profile depar¬ 
tures that the Guardian surmised would "have a chilling effect” on tech workplace 
activism. 25 

This is the deployment of the structural and disciplinary domains of the matrix of 
domination, which we introduced in chapter 1. Google's legal team is well-resourced 
and has the power to shape both federal laws and company policies. This confirms 
the need to monitor dominant groups and institutions that wield outsized power in 
the world (and tend to use it to secure their positions). It also affirms the need to col¬ 
laborate with the groups most impacted by differentials of power. In chapter 2, and 
throughout this book, we have attempted to heed our own advice, featuring the voices 
and ideas of those with direct experience of injustice. In so doing, we have sought to 
feature the sites of energy that have inspired us in our work—ranging from new activ¬ 
ist networks to data journalism startups, from librarians authoring data user guides to 
engineers interrogating human-reporting bias. We've drawn from the work of sociolo¬ 
gists who theorize digital power, artists who challenge technological neutrality, educa¬ 
tors who teach statistics in real-world settings, and individuals who are single-handedly 
compiling spreadsheets of missing data. It is from all these locations, using all these 
methods, and including all these people—and more—that we can challenge the matrix 
of domination in data science at its source. 

As should now be clear, our definition of data science includes more than quantita¬ 
tive methods, more than "big" data, more than "artificial" intelligence, and more than 
"neutral" displays. We explored the limitations of such a narrow view of data science 
and its communication in chapter 3. There and throughout the book, we have argued 
that an expansive conception of data science is essential if we are to work toward our 
goal of remaking the world. 

Enabling this feminist data science to flourish and thrive will require deliberate 
interventions in each phase of data work, and in our received ideas about the people 
and communities who perform it. In chapter 4, we showed how the decisions that 
are made when first collecting data go on to impact future results. In chapter 5, we 
debunked the myth that data science is a solo enterprise, undertaken by genius wizards 
working alone. Data science involves collaboration and community, as well as deep 
context, as we discussed in chapter 6. Equally important is the acknowledgment, as 
explored in chapter 7, that data science is the work of many hands. 
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Model Card - Smiling Detection in Images 


Model Details Quantitative Analyses 

• Developed by researchers at Google and the University of Toronto, 2018, vl. 

• Convolutional Neural Net. 

• Pretrained for face recognition then fine-tuned with cross-entropy loss for binary 
smiling classification. 

Intended Use 

• Intended to be used for fun applications, such as creating cartoon smiles on real 
images; augmentative applications, such as providing details for people who are 
blind; or assisting applications such as automatically finding smiling photos. 

• Particularly intended for younger audiences. 

• Not suitable for emotion detection or determining affect; smiles were annotated 
based on physical appearance, and not underlying emotions. 

Factors 

• Based on known problems with computer vision face technology, potential rel¬ 
evant factors include groups for gender, age, race, and Fitzpatrick skin type; 
hardware factors of camera type and lens type; and environmental factors of 
lighting and humidity. 

• Evaluation factors are gender and age group, as annotated in the publicly available 
dataset CelebA [36]. Further possible factors not currently available in a public 
smiling dataset. Gender and age determined by third-party annotators based 
on visual presentation, following a set of examples of male/female gender and 
young/old age. Further details available in [36]. 

Figure 8.3 

Detail of a model card, from a 2019 paper titled "Model Cards for Model Reporting" by AI 
researcher Margaret Mitchell and coauthors that proposes short documents called model cards that 
would accompany machine learning models as a form of documentation. Model cards detail who 
developed the model, for what purpose, and how the model performs, including intersectional 
identity metrics. Model cards would also specify known limitations of a model and use cases for 
which the model is not suitable. Courtesy of Margaret Mitchell. 
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Figure 8.4 

Feminindex is a civic media project that documents and visualizes where all Argentine political 
candidates stand on gender and LGBTQ+ issues, including reproductive rights, femicides, and 
trans rights. The first version was released in 2017 and the second in 2019. Courtesy of Economia 
Femini(s)ta, including Mercedes D'Alessandro, Andres Snitcofsky, Lina Castellanos, Aldana Vales, 
and the Economia Femini(s)ta team. See http://economiafeminita.com/activismo/feminindex/. 





















REDLINING DESTROYED 
THE POSSIBILITY OF 
INVESTMENT WHEREVER 
BLACK PEOPLE LIVED.” 
TA-NEHISI COATES 
“THE CASE FOR 

reparations” 



Figure 8.5 

Decoding Possibilities (2017) by Ron Morrison and Treva Ellison, is an artistic examination of 
redlining's effects in the landscape as well as a celebration of creative resistance to redlining, (a) 
Contemporary maps of Boston are combined with historic redlining maps, as well as maps created 
from the Combahee River Collective's writings, (b) The installation includes quotes on the endur¬ 
ing effects of redlining in the landscape. Courtesy of Ron Morrison and Treva Ellison. 
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Throughout the book, we have described our seven principles of data feminism: 
examine power, challenge power, elevate emotion and esmbodiment, rethink binaries and 
hierarchies, embrace pluralism, consider context, and make labor visible. We derived these 
principles from the major ideas that have emerged in the past several decades of inter¬ 
sectional feminist activism and critical thought. At the same time, we welcome the 
notion that there are many other possible starting points that share the end goal of 
using data (or refusing data) in order to end oppression. 26 

Those other starting points might come from within the academy. For example, 
the work of the Center for Spatial Research at Colombia, led by Laura Kurgan, uses a 
uniquely transdisciplinary approach that includes data science and AI, the humani¬ 
ties, geography, and design to investigate complicated phenomena like urban/rural dis¬ 
placement due to conflict (figure 8.2). Scholars like Dean Spade are using queer theory 
to challenge the institutions that wield data. And media studies scholars are examining 
the intersections of race, gender, sexuality, and data, as Shaka McGlotten does through 
their Black data project. 27 Researchers are writing books about Indigenous statistics 
and Indigenous data sovereignty, 28 developing decolonial design methods, 29 and lead¬ 
ing dynamic conversations about decolonizing data in both the Global North and the 
Global South. 30 Computer scientists and AI researchers are conducting important stud¬ 
ies on bias, as well as developing new ways to promote transparent and responsible use 
of AI. For example, Margaret Mitchell and her coauthors have recently proposed model 
cards (figure 8.3), a form of documentation that would accompany machine learning 
models to detail their intended uses and their technical and ethical limitations. 31 

There are many other possible starting points for challenging oppression in data 
that come out of the arts, activism, community organizing, and consciousness-raising. 
Cartographer Margaret Pearce's next mapping project indigenizes the Mississippi River 
map to make new spaces for public dialogue about flood management. Mimi Onu- 
oha and Mother Cyborg's People's Guide to AI, designed for newcomers, provides an 
accessible introduction to the ideas behind artificial intelligence. The activist group 
Economia Femini(s)ta in Argentina has an ongoing civic accountability project called 
Feminindex in which the group visualizes where each candidate stands in relation to 
a range of gender and LGBTQ+ issues (figure 8.4). The group has even produced digital 
trading cards for politicians, which it circulates on social media. And artist-researchers 
Ron Morrison and Treva Ellison disrupt Boston redlining maps from 1935 with over¬ 
lays of "black queer, trans, and feminist geographies" created by the Combahee River 
Collective (figure 8.5). These require viewers to put on special glasses called Racialized 
Space Reduction Lenses (RSRL) to see beneath the surface. 32 
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These projects are not intended to be exhaustive, and the list could go on. What 
is most important is not that we all share the same starting point, but rather that we 
nurture all of these emerging ecosystems and build links between them. We will need 
all of them for mobilizing resistance to the differentials of power embedded in our 
current datasets and data systems. And we will also need them for mobilizing cour¬ 
age and creativity—to imagine what data science and artificial intelligence beyond the 
matrix of domination might look like. The best time for resistance and reimagination 
is before the norms and structures and regulations of the data economy have been fully 
determined. 

So now let’s multiply. Let's multiply now. 


Our Values and Our Metrics for Holding Ourselves Accountable 


A note to readers: We created this document as we began the writing of this book and 
included it as part of the manuscript draft that was posted online as part of the open 
peer review process. We were prompted to write it because of work on a prior project— 
the Make the Breast Pump Not Suck ffackathon—with equity consultant Jenn Roberts 
of Versed Education, and because of the values statements (and related statements of 
principles) published by groups such as the University of Maryland's African American 
History, Culture, and Digital Humanities Initiative (AADHum) and the University of 
Delaware’s Colored Conventions Project. 1 From these projects, we saw how statements 
of shared values can become important orientation points, guiding internal decisions 
at challenging junctures and making ethical commitments public and transparent. The 
idea to accompany our values with a set of metrics was also proposed by Jenn Roberts 
for the breast pump hackathon. We discuss that project, and the uses and limits of met¬ 
rics for accountability, in chapter 4. The metrics below were calculated two times: first 
on the basis of the draft posted online, and second on the basis of the copyedited book 
manuscript. Aside from the addition of the second set of metrics, and a short reflection 
on our successes and failures, the language of the document remains unchanged from 
the version posted as part of the manuscript draft. 

We Insist on Intersectionality 

Feminism has always been multivocal and multiracial, but the movements' diverse 
voices have not always been valued equally. The women's suffrage movement largely 
excluded Black women and the abolition of slavery from its agenda. In the 1970s, 
lesbian feminists were called "the purple menace” by straight feminists. But feminism 
fails altogether if it is only for elite, white, straight, Christian, Anglo women. The work 
of activists and scholars, particularly Black feminists, over the past forty years insists on 


216 


Our Values and Our Metrics for Holding Ourselves Accountable 


a feminism that is intersectional, meaning it looks at issues of social power related not 
just to gender, but also to race, class, ability, sexuality, immigrant status, and more. It 
does so, moreover, by looking to collectives as well as individuals, to structural issues as 
well as specific instances of injustice. 

We Advocate for Equity 

Equity is both an outcome and a process. Future justice must account for an unjust past 
in which some groups' knowledges have been valued and others have been subjugated, 
as Patricia Hill Collins teaches us. In the process of achieving equity, those of us in 
positions of relative power must learn to listen deeper and listen differently—with the 
ultimate goal of taking action against the status quo that benefits us at the expense of 
others. For this reason, we listen and give priority in the text to voices who speak from 
marginalized perspectives, whether because of their gender, ability, race, class, colonial 
status, or other aspects of their identity. 

We Prioritize Proximity 

As Kimberly Seals Allers, women's health advocate, says, "Whatever the question, the 
answer is in the community." People in a community know its problems intimately, 
and they know which phenomena go uncounted, underreported, or neglected by insti¬ 
tutions in power (or, conversely, who is overly surveilled by institutions in power). 
They also know what interventions will work to solve those problems. In this book, we 
try to prioritize voices with closer and more direct experience of issues of injustice over 
those that study a data injustice from a distance. 

We Acknowledge the Humanity of Data 

We recognize that the transformation of human experience into data often entails a 
reduction in complexity and context. We further acknowledge that there is a long his¬ 
tory of data being "all too often wielded as an instrument of oppression, reinforcing 
inequality and perpetuating injustice," as the group Data for Black Lives explains. We 
keep these inherent constraints in mind as we write, attempting to introduce context 
and complexity whenever possible, and acknowledge the limits of the methods we 
discuss, as well as their strengths. 
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We Are Reflexive, Transparent, and Accountable 

Acknowledging that our knowledge is shaped by our own perspectives and limita¬ 
tions (see more in the About Us section ahead), we strive to be reflexive, transpar¬ 
ent, and accountable for our work. We are on a journey toward justice, and that 
inevitably involves making mistakes. We are grateful to those who have shown us 
generosity in letting us learn up to this point. And we respectfully say to our future 
teachers that you will find in us open listeners: we recognize direct and critical words 
as a generous offer and a vote of confidence in our ability to hear and be transformed 
by you. 

To that end, we have an evolving table of explicit metrics (table A.l) that will guide 
us in auditing our citations and the examples that we elevate in the book. We note, 
here, that our foregrounding of race and racism reflects our location in the United 
States, where the most entrenched issues of inequality and injustice have racism at 
their source. 

About Us 

Feminist standpoint theory recognizes the value of situated knowledge—acknowledging 
the perspectives and experiences of the knower and how those have shaped the knowl¬ 
edge they produce. Accordingly, we situate ourselves and the learning contexts in 
which we work. 

Catherine D'lgnazio is an assistant professor at Massachusetts Institute of Technology, a private 
research university in Cambridge, MA. Before moving to MIT, she worked at Emerson College, a 
private college in Boston focused on communications and the arts. From a middle-class, Italian- 
American and Scotch-Irish background, she grew up primarily in the American South, with some 
formative years spent in Latin America and Europe. She is a mother, an experience that has 
sharpened her understanding of how women's bodies are stigmatized and underserved by main¬ 
stream institutions. Working mainly in urban New England, she experiences significant privilege 
from her whiteness, ability, institutional affiliation, and education, among other things, and 
experiences some oppression based on her gender. With decades of professional work in software 
programming, art/design, and digital media education, she comes to data feminism based on a 
commitment to democratize information and include more people and professions in contempo¬ 
rary conversations about data and power. 

Lauren F. Klein is an associate professor at Emory University, a private university in Atlanta, Geor¬ 
gia, in the Southern United States. Before moving to Emory, she worked at Georgia Tech, a large 
public research university also in Atlanta. From a middle-class New York Jewish family, she grew 
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Table 1 

Aspirational, draft, and final metrics and the structural problems they address 

Aspirational metrics to Final metrics 

Structural live our values for Draft metrics (copyedited 

problem this book (open peer review) manuscript) 


Racism 

• 75 percent of citations 
of feminist scholarship 
from people of color 

• 75 percent of examples 
of feminist data projects 
discussed led by people 
of color 

Scholarship: 36 
percent from 
people of color 
Projects: 49 
percent led by 
people of color 

Scholarship: 32 
percent from 
people of color 
Projects: 42 
percent led by 
people of color 

Patriarchy 

• 75 percent of all 
citations and examples 
from women and 
nonbinary people 

67 percent of 
citations and 
examples from 
women and 
nonbinary people 

62 percent of 
citations and 
examples from 
women and 
nonbinary people 

Cissexism 

• Center trans perspectives 
in discussions of the 
gender binary 

• Use transinclusive 
language throughout 
the book 

• Example or theorist in 
every chapter from a 
transgender perspective 

Three of ten 
chapters feature 
transgender 
example and/or 
theorist 

Nine of nine 
chapters feature 
transgender 
example and/or 
theorist 

Heteronormativity 

• Resist assumptions 
about family structure 
and gender roles 

• Example or theorist 
in every chapter that 
illustrates the power of 
communal (vs. family) 
support networks 

Ten of ten chapters 
feature communal 
example and/or 
theorist 

Nine of nine 
of ten chapters 
feature communal 
example and/or 
theorist 

Ableism 

• Challenge the 
dominance of 
visualization in the 
presentation of data 

• Example or theorist 
in every chapter that 
employs nonvisual 
methods of presenting 
data 

Nine of ten 
chapters feature 
nonvisual example 
and/or theorist 

TK of ten chapters 
feature nonvisual 
example and/or 
theorist 
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Table 1 (continued) 


Aspirational metrics to Final metrics 

Structural live our values for Draft metrics (copyedited 

problem this book (open peer review) manuscript) 


Colonialism 

• 30 percent of projects 

Projects: 8.5 

Projects: 7 percent 


discussed come from the 

percent from the 

from the Global 


Global South 

Global South 

South 


• Example or theorist in 

Five of ten 

Seven of nine 


every chapter about 

chapters feature 

chapters feature 


Indigenous knowledges 

indigenous 

indigenous 


and/or activism 

example and/or 
theorist 

example and/or 
theorist 

Classism 

• Acknowledge that data 

Projects: 88 

Projects: 78 


science, as a field, is 

percent from 

percent from 


premised on economic, 

outside academy 

outside academy 


educational, and 

Ten of ten 

Nine of nine 


technological privilege 

chapters feature 

chapters feature 


• 50 percent of feminist 

nonacademic 

nonacademic 


projects discussed 

example and/or 

example and/or 


come from outside the 
academy 

• Example or theorist 
in every chapter that 
demonstrates how the 
ideas can be applied 
without expensive 
technology and/or 
formal training 

theorist 

theorist 

Proximity 

• 50 percent of feminist 

Projects: 49 

Projects: 34 


projects discussed 

percent feature 

percent feature 


feature and quote 

people directly 

people directly 


people directly impacted 
by an issue (vs. those 
who study or report on 
the phenomena from a 
distance) 

impacted 

impacted 


up in suburban New Jersey and lived in New York for much of her adult life, with some time spent 
in Boston. Like D'lgnazio, she is also a mother. Working in the US South, she experiences signifi¬ 
cant privilege from her whiteness, ability, education, and institutional affiliation, among other 
things, and experiences some oppression based on her gender. She worked in web development 
before becoming an academic and comes to data feminism through her desire to convert theory 
into practice and to create more opportunities for humanities research (and researchers) to enter 
into conversation with communities, activists, organizers, and others working toward justice. 
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A Reflection on Our Final Metrics 

There are interesting shifts to note between the metrics of the first draft and those of 
the final version. Some measures landed closer toward our goals in the final version. 
For example, we were able to meet our goal of including a trans voice in every chapter, 
and we included more Indigenous perspectives in the final version than in the first 
draft. However, we were not able to achieve all of our aspirational metrics in the final 
draft, and in fact, the proportional representation of women, nonbinary people, and 
people of color decreased from the first draft to the copyedited manuscript. The final 
manuscript also had proportionally fewer projects from the Global South, more proj¬ 
ects from the academic realm, and less work from people who were directly impacted 
by each issue. 

What explains this outcome that runs counter to our values and stated goals? In 
our view, there are two explanations. The first has to do with process: we did not keep 
precise tabs on our citational metrics as we were revising. We hadn't wanted to give the 
aspirational metrics so much presence in our revision process that we would choose 
people and projects simply because of their gender, race, or other markers of identity. 
That seemed tokenizing and like "gaming the system” we had created. We did keep 
the overall gaps between our draft metrics and our aspirational metrics in mind, and 
attempted to close those gaps as we introduced new examples and removed old ones 
And we did in fact introduce more overall references to, for example, scholarship and 
projects led by people of color (243 vs. 92), and scholarship and projects based in the 
Global South (41 vs. 20). But we also introduced more references to scholarship and 
projects led by white people (475 vs. 136), as well as more scholarship and projects 
based in the Global North (933 vs. 278). In retrospect, since we didn't calculate the 
second set of metrics until the end of the revision process, we were not aware of the 
changing proportions of projects and citations, and therefore had no way of rebalanc¬ 
ing them after the fact. 

An additional pass through the manuscript with this rebalancing in mind would 
have addressed the issue in the book. Indeed, we wish we had planned for this final 
pass in our revision process. But the root cause of the issue would have remained unre¬ 
solved. Put simply: this cause is ourselves and our positions as scholars in the world 
of higher education, a world dominated by white people and especially white men. 
To some readers, this answer may seem obvious, but our path to it is worth taking the 
time to explain. 

Many people who participated in our peer review process (both online and anony¬ 
mously) noted that we should back up our assertions with citations. We agreed with 
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this point, and felt that additional citations would both credit prior work and add 
legitimacy to our claims. Between the draft and the final version, we added many end- 
notes and additional references. As a result, the final version of this book is signifi¬ 
cantly more scholarly than the draft we posted online. But when looking at the history 
of engagement with a particular idea, or when asking ourselves which notable person 
in a particular field we should name, we thought less about our values for the book 
and more about what we already knew about those areas. In so doing, we inadvertently 
reproduced the biases of academia—ironically, through a mechanism very similar to 
the privilege hazard we name in the book. 

We thought about not including the endnotes in our final accounting; that might 
have yielded "better" metrics. But that would have misrepresented the contents of the 
book—and, more importantly, the extent of the work that remains to be done. Instead, 
our final metrics offer a quantified reminder that "legitimate knowledge" has a race 
and a gender, as well as a class and a geographic location. These characteristics are 
inherited from the matrix of domination, and sustained by the matrix of domination 
as well. Although we challenge that fact in our writing, we readily admit that we fell 
back on learned habits in our citational practices, particularly when we felt that our 
scholarly credibility was on the line. 

Our takeaway from this process is something we already knew but learned once 
again: values are not enough. We have to put those values into action and hold our¬ 
selves accountable time and time again. This constant emphasis on accountability is 
not easy, and it is not always successful (case in point). It also takes time. Our final 
metrics are uncomfortable but in some ways constructive: they serve as evidence of the 
distance between our ideals and our actions, they help us locate the help we need to 
bridge those gaps, and they help us persist. 



Auditing Data Feminism, by Isabel Carter 


In the interest of remaining accountable to the values statement for this book, the 
authors tasked me with performing an audit of all the individuals, projects, and orga¬ 
nizations referenced in Data Feminism. Quantifying these references provided impor¬ 
tant information about which perspectives were being included in the work and to 
what extent. At the same time, this process presented difficult-to-answer questions 
about identification and classification. Therefore, this methods statement will serve to 
explain how we approached those questions and what our answers were. 

First, we had to decide what would constitute a "reference" that needed to be 
recorded in the audit. Every individual mentioned by name was counted as a single ref¬ 
erence, as was every project (i.e., Kiln's Ship Map). Corporations were mostly excluded 
from the audit unless they played an active role in an example and were mentioned 
more than once. For instance, Target is included because its pregnancy-targeted mar¬ 
keting strategy is used as a major example in chapter 1. On the other hand, Instagram is 
not included despite being mentioned multiple times in the Serena Williams example 
because in that case the social medium is merely the site where Williams and her fans 
exchanged stories of their birth experiences. 

Each categorization then required its own investigation. For individuals, we 
attempted to verify their race, gender, country of origin (which fed into our Global 
South/Global North distinction), and indigeneity. Additional categorizations included 
whether or not references were from within the academy and if they represented an 
example of good data practices or "what not to do." References were logged as "com¬ 
munal" if they were community-driven (e.g., Data for Black Lives), and a separate cat¬ 
egory recorded whether the reference provided a "nonvisual example" of data work 
or not. Each record was further classified by importance. One reference constituted 
"passing" importance; two to four references, "more than once;" and beyond that, 
"central." 
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Although categories like "importance" and "nonvisual example" were straightfor¬ 
ward to assign, others like race and gender were not. We tried to confirm these catego¬ 
rizations through online research, but without the self-identification of the referenced 
individuals, these categories are obviously subject to further inquiry. It should be noted 
that only individuals were counted by race or gender, unless an example hinged on a 
racialized or gendered assertion. For example, General Motors is discussed in relation 
to its discriminatory treatment of Emma DeGraffenreid, which became the impetus for 
legal scholar Kimberle Crenshaw's study of "intersectionality." Discriminatory hiring 
on the basis of race and gender is the reason for General Motors's inclusion in the book; 
therefore the company is listed as "white" and "man" in the audit. Also, if upon further 
research an organization was found to have an all-white board or staff, like Clarksons 
Research UK, it was categorized as "white." 

Future attempts to replicate this audit should take seriously the difficulty of clearly 
establishing these identity categories without formally consulting with those who are 
being referenced and therefore classified. Some classifications—such as gender—may 
not be possible to define, as individuals may choose not to publicly disclose their status 
as trans or nonbinary to avoid discrimination or simply because they consider it private 
information. Also, we would like to acknowledge that categorizing references by their 
social identities and structural affiliations does not inherently guarantee a representa¬ 
tive discourse. However, we consider this an important effort to remain accountable 
to the values that brought forth this entire project, which include intersectional and 
antihierarchical thought, the honoring of a multiplicity of viewpoints, and a commit¬ 
ment to acknowledging our own positions and limitations. 

Isabel Carter is a multimedia reporter who received their master of arts in journalism at Emerson 
College in 2019. 


Acknowledgment of Community Organizations 


Many community organizations are already modeling the principles of data feminism 
that we have described in this book. As part of our continued efforts to share power, 
we are redirecting a portion of royalties from this book to two of these organizations 
that have significance to the authors. We encourage you to seek out these organiza¬ 
tions, engage with them on social media and in real life, and consider how you might 
contribute to their work. 



Indigenous Women Rising is committed to honoring Native and Indigenous people's inher¬ 
ent right to equitable and culturally safe health options through accessible health education, 
resources, and advocacy. Follow Indigenous Women Rising on Twitter at @IWRising, on Insta- 
gram at @indigenouswomenrising, and at www.iwrising.org. 
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CHARIS 



CIRCLE 


Founded in 1996, Charis Circle is the 501c3 nonprofit programming arm of Charis Books, the 
South's oldest independent feminist bookstore. Charis Circle works with artists, authors, and 
activists from across the South and around the world to bring innovative, thoughtful, and life¬ 
changing programming and events to feminist communities. Charis Circle exists to foster sustain¬ 
able feminist communities, work for social justice, and encourage the expression of diverse and 
marginalized voices. Visit Charis Circle at www.chariscircle.org. 
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1. The Hampton sit-ins were the first in the state of Virginia and contributed significantly to the 
dismantling of the Jim Crow era policies of segregation that were still in place at the time. See 
"Desegregating Hampton: Hampton University Students' Woolworth's Sit In," Humanities for 
All, National Humanities Alliance, accessed July 23, 2019, https://humanitiesforalI.org/projects/ 
oral-history-of-hampton-va-woolworth-s-sit-in. 
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reframed the events of Darden's life to emphasize the role that data played in her career, we 
remain indebted to Shetterly for calling our attention to Darden, as well as her extensive research 
on Darden's life. For more information on Darden, see the book Hidden Figures: The Untold True 
Story of Four African-American Women who Helped Launch Our Nation Into Space (New York: William 
Morrow, 2016). Darden does not appear in the film of the same name. For additional scholarship 
on the history of women in computing, see Jennifer Light, "When Computers Were Women," 
Technology and Culture 40, no. 3 (1999): 455-483; Nathan Ensmenger, The Computer Boys Take 
Over: Computers, Programmers, and the Politics of Technical Expertise (Cambridge, MA: MIT Press, 
2010); and Mar Hicks, Programmed Inequality: How Britain Discarded Women Technologists and Lost 
Its Edge in Computing (Cambridge, MA: MIT Press, 2017). 

3. Merriam-Webster Dictionary, "Feminist," accessed July 22, 2019. In the song "'“Flawless" 
(2014), Beyonce samples portions of a TED Talk by the Nigerian writer Chimamanda Ngozi 
Adichie, who cites the Merriam-Webster definition in her remarks. See Chimamanda Ngozi 
Adichie, "We Should All Be Feminists," filmed December 2012 at TEDxEuston, video, 29:28, 
https://www.ted.com/talks/chimamanda_ngozi_adichie_we_should_all_be_feminists. 

4. Scholars have long described the evolution of feminism in terms of three waves. The first 
wave is said to have spanned much of the nineteenth and early twentieth centuries, culminat¬ 
ing in the United States in 1920 with the passage of the Nineteenth Amendment, which gave 
women the right to vote. Women's suffrage and related legal issues were the focus of this wave. 
The second wave, which we reference here, is said to have encompassed the early 1960s to the 
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early 1980s. This wave was concerned with a wider range of legal and social issues, including the 
workplace conditions that Friedan describes in her book, as well as reproductive rights, domestic 
violence, family roles, and issues of sexuality, among others. It is said to have lost cohesion in 
the 1980s as a result of internal debates within the movement about sexuality and pornography, 
among others. Feminism's third wave is said to have begun in the 1990s and is characterized 
by an increased attention to the idea of intersectionality and the emphasis on both individual 
differences and structural power that the concept entails. Some scholars have proposed that 
we've entered a fourth wave of feminism, coinciding with the rise of social media in the early 
2010s. With all that said, other scholars have rejected the notion of waves altogether for how 
it elides the longer and more sustained work of organizing and activism that took place before, 
during, and after these waves—especially by women of color, whose efforts did not often receive 
as much popular attention as those of their white counterparts. Because we endorse this cri¬ 
tique, we attempt to de-emphasize the narrative of waves in this book, employing the terminol¬ 
ogy of waves only when it helps to establish the context of a particular example, individual, 
or group. 

5. See bell hooks, Feminist Theory: From Margin to Center (New York: Routledge, [1984] 2015). 

6. In Black Feminism Reimagined: After Intersectionality (Durham, NC: Duke University Press, 
2019), Jennifer C. Nash references the work of Vivian May, who traces the intellectual origins of 
intersectionality to nineteenth-century scholar/activist Anna Julia Cooper; see Vivian M. May, 
"Intellectual Genealogies, Intersectionality, and Anna Julia Cooper," in Feminist Solidarity at the 
Crossroads: Intersectional Women's Studies for Transracial Alliance, edited by Kim Marie Vaz and 
Gary L. Lemons (New York: Routledge, 2012), 59-71. Brittney Cooper, in Beyond Respectability: 
The Intellectual Thought of Race Women (Urbana: University of Illinois Press, 2017), places Cooper 
within an intellectual milieu populated by several late nineteenth-century and early twentieth- 
century figures. P. Gabrielle Foreman, in Activist Sentiments: Reading Black Women in the Nineteenth 
Century (Urbana: University of Illinois Press, 2009), illuminates the intersectional thought and 
activism of Black women writers (and readers) in the first half of the nineteenth century. These 
references are not intended to be comprehensive. Rather, they are intended to give a sense of the 
long history behind the concept of intersectionality. 

7. "The Combahee River Collective Statement,” 1978, Circuitous.org, accessed April 3, 2019, 
http://circuitous.org/scraps/combahee.html. 

8. For those who want to learn more about the values that have guided our work on this book, 
as well as the voices and work that we have sought to amplify, please see our values statement 
included as an appendix. 

9. Sandra Johnson, "Interview with Gloria R. Champine," May 1, 2008, NASA Headquarters 
NACA Oral History Project, https://historycollection.jsc.nasa.gov/JSCHistoryPortal/history/oral 
_histories/NACA/champinegr.htm. 

10. We'll discuss this feeling of "shock" in more detail in later chapters. Here, we will simply 
make note of it to emphasize how shock is such a common experience for people who occupy 
dominant group identities when they—and often we, your authors—find out about injustices. 
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"A sexist promotion structure/dataset/algorithm?!" we say incredulously. "How could that have 
happened?" Darden's boss, a white man, was surprised because the gender disparity was not 
something he expected to see, or was even looking for, since it wasn't something that he had per¬ 
sonally experienced. Data feminism asks us to deliberately and explicitly apply an intersectional 
lens to our data science work—even and especially when we represent dominant group positions. 

11. "More than 40 Take the Buyout, Retire," Researcher News, April 4, 2007, https://www.nasa 
.gov/centers/langley/news/researchernews/rn_07retirees.html. 

12. The scholarship on this subject is vast. Key anthologies include Beverly Guy-Sheftall, Words 
of Fire (New York: New Press, 1995); and Cherrie Moraga and Gloria E. Anzaldua, This Bridge 
Called My Back: Writings by Radical Women of Color (London: Persephone Press, 1981). For a recent 
anthology in this tradition, see The Crunk Feminist Collection, edited by Brittney C. Cooper, Susana 
M. Morris, and Robin M. Boylorn (New York: Feminist Press, 2017). 

13. One such example is Sojourner Truth (1797-1883), who worked tirelessly over the course 
of the nineteenth century to advance both women's and civil rights. Truth, a Black woman 
who escaped slavery as a young mother in 1826, is most famous today for the speech "Ain't I a 
Woman?" The memorable title line has inspired generations of intersectional activist work. In 
truth, however, Truth did not state those words in her original speech. She said, "I am a woman's 
rights." It was a white abolitionist who, a decade later, rewrote Truth's speech in Southern dialect 
to make the meaning of her statement clearer to other white abolitionists, who would encoun¬ 
ter Truth's lines only in print. To scholars today, the example of Sojourner Truth exemplifies 
both how Black women were on the forefront of intersectional feminist activism and how white 
women have historically sought to control that narrative, even when it misrepresents ideas and 
intentions. See the Sojourner Truth Project for the history and context of this example: https:// 
www. theso j ournertruthpro j ect.com/. 

14. Scholars often cite two of Crenshaw's early legal essays for the genesis of the term: 
"Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Antidiscrimi¬ 
nation Doctrine, Feminist Theory, and Antiracist Politics," University of Chicago Legal Forum 8 
(1989): 139-167; and "Mapping the Margins: Intersectionality, Identity Politics, and Violence 
against Women of Color," Stanford Law Review 43, no. 6 (1991): 1241-1299. 

15. As an accessible entry point into Crenshaw's work, see her TED Talk, "The Urgency of Inter¬ 
sectionality," filmed October 2016 at TEDWomen 2016, video, 18:50. For a compilation of her 
writings about intersectionality, including her early work referenced in note 8, see On Intersec¬ 
tionality: Essential Writings (New York: New Press, 2019). More recently, Crenshaw has been at 
the forefront of #SayHerName, a campaign to make the gendered dimensions of police violence 
visible. For more on this effort, see Kimberle Crenshaw, Andrea J. Ritchie, Rachel Anspach, Rachel 
Gilmer, and Luke Harris, "Say Her Name: Resisting Police Brutality against Black Women," 2015, 
African American Policy Forum, Center for Intersectionality and Social Policy Studies, Columbia 
Law School, http://aapf.org/sayhernamereport. 

16. Positionality is a term that describes how individuals come to knowledge-making processes 
from multiple positions, including race, gender, geography, class, ability, and more. Each of these 
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positions is shaped by culture and context, and they intersect and interact. We, the authors, have 
a statement about our own positionalities as an appendix in this book. 

17. On the popular educational site Everyday Feminism, the comic artist Robot Hugs explains 
what oppression feels like at the individual level: "[It is] when prejudice and discrimination is 
supported and encouraged by the world around you. It is when you are harmed or not helped by 
government, community or society at large because of your identity." "Having Trouble Explain¬ 
ing Oppression? This Comic Can Do It for You," January 30, 2017, https://everydayfeminism 
.com/2017/01/trouble-explaining-oppression/. Ashley Crossman offers a good explainer of the 
sociological understanding of oppression in "What Is Social Oppression," ThoughtCo, January 
28, 2019, https://www.thoughtco.com/social-oppression-3026593. 

18. Sexism is discrimination based on a person's sex or gender. Cissexism applies to discrimi¬ 
nation against transgender people. Patriarchy is a term that describes the combination of legal 
frameworks, social structures, and cultural values that contribute to the continued male domina¬ 
tion of society. 

19. For more on the racism encoded in demands for "proof,” see chapter 2, as well as Candice 
Lanius, "Fact Check: Your Demand for Statistical Proof Is Racist," Cyborgology, January 12, 2015, 
https://thesocietypages.org/cyborgology/2015/01/12/fact-check-your-demand-for-statistical 
-proof-is-racist/. Maya Randolph has connected the points made by Lanius to a famous quote by 
Toni Morrison, from 1975: "The function of racism is distraction. It keeps you from doing your 
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that up. None of that is necessary. There will always be one more thing." "A Humanist View," 
talk delivered at Portland State University, May 30, 1975, transcribed by Keisha E. McKenzie, 
accessed July 23, 2019, https://www.mackenzian.com/wp-content/uploads/2014/07/Transcript 
_PortlandState_TMorrison.pdf. We thank Momin Malik for bringing this quotation to our 
attention. 

20. We discuss these demographics, and the matrix of domination that they create and sustain, 
in detail in chapter 1. 
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in chapter 5. 
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